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1  Background 

The  research  program  on  advanced  concepts  and  methods  of  approximate  seeks  to  establish  clear  formal 
foundations  that  advance  the  understanding  of  approximate  reasoning  methodologies.  The  approaches  that 
are  being  studied  are  fundamental  techniques  for  the  analysis  of  imprecise,  uncertain,  and  unreliable  data 
that  are  applicable  in  a  wide  variety  of  important  contexts. 

In  particular,  we  want  to  identify  and  study  frameworks  that  facilitate  the  comparison  of  the  features  of 
each  approach  allowing  the  determination  of  its  utility  in  the  solution  of  specific  problems.  Our  research  also 
seeks  to  broaden  the  scope  of  applicability  of  existing  methods  by  consideration  of  approximate  reasoning 
mechanisms  that,  going  beyond  the  mere  extension  of  classical  deductive  techniques,  seek  to  develop  intelli¬ 
gent  systems  capable  of  performing  inductive  (i.e.,  learning),  abductive  (i.e.,  discovery  or  explanation),  and 
analogical  (i.e.,  similarity-based)  functions.  Furthermore,  we  are  interested  in  expanding  the  scope  of  our 
knowledge  sources  beyond  behavioral  knowledge  (e.g.,  expert-generated  rules)  and  current  observations,  to 
include  historical  databases  of  relevant  experience. 

2  U.S.  Air  Force  Relevance 

The  questions  addressed  by  this  program  of  research  are  related  to  basic  issues  of  knowledge  and  information 
and,  as  such,  applicable  results  will  have  a  wide  impact  accross  a  variety  of  important  applications  of  USAF 
interest. 

Practically  every  important  real-life  problem  is  characterized  by  the  presence  of  information  that  is  not 
totally  precise,  certain,  or  credible.  These  undesirable  knowledge  features  are  often  found  in  the  military 
domain  where  the  size  and  complexity  of  systems,  coupled  with  the  presence  of  agents  actively  seeking  to 
deny  and  falsify  information,  renders  their  precise  observation  difficult  or  impossible. 

The  need  to  process  imprecise  and  uncertain  knowledge  is  obvious  in  military  intelligence  problems, 
where  the  objectives  are  situation  assessment  and  decision-support  on  the  basis  of  the  information  provided 
by  multiple  items  of  evidence  that,  typically,  are  imprecise,  incomplete,  and  of  limited  reliability.  In  many 
other  problems  of  Air  Force  interest,  however,  availability  of  took  for  approximate  reasoning  (including 
methods  to  determine  applicability  and  usefulness  of  specific  techniques)  is  of  paramount  importance. 

Probabilistic  reasoning,  for  example,  is  a  key  element  of  the  command  and  control  process  beyond  situ¬ 
ation  assessment,  due  to  its  direct  relevance  to  issues  such  as  the  determination  of  the  viability  of  missions 
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and  the  reliability  of  information  sources  and  control  chains.  In  a  broader  context,  probabilistic  analysis 
is  an  essential  tool  in  the  failure-diagnosis  and  reliability-analysis  problems  that  are  commonly  found  any 
organization  that  utilizes  large-scale  systems. 

Possibilistic  reasoning  methods,  because  of  their  relations  with  analogical  reasoning  (which  were  elab¬ 
orated  and  clarified  in  the  task  being  described),  are  also  of  direct  relevance  to  a  myriad  of  problems  of 
interest.  Situation  analysis,  plan  construction  (e.g.,  mission  planning),  and  system  design  are  just  a  few  of 
the  potential  applications  of  methods  that  exploit  databases  of  historical  experience  to  determine  solutions 
to  new  problems.  For  example,  in  a  command  and  control  application,  lessons  learned  in  previous  situations 
may  be  directly  retrieved  and  analyzed  to  determine  courses  of  action  that  are  applicable  in  the  current 
context.  Similarly,  system  design  (e.g.,  an  aircraft  subsystem)  might  be  considerably  simplified  by  use  of 
similarity-based  tools  that  suggest  plausible  design  choices  on  the  basis  of  existing  knowledge. 

Beyond  these  applications  of  “case-based  reasoning,”  recent  experience  with  the  development  of  large- 
scale  controllers  based  on  possibilistic  logic  indicates  that  this  type  of  reasoning  leads  to  the  development  of 
autonomous,  robust  controllers  for  unstable  systems.  Among  these,  the  control  of  active  flexible  wings  using 
a  fuzzy-logic  approach  (being  currently  considered  by  Rockwell  International)  deserves  special  mention  due 
to  its  USAF  relevance.  Similar  controllers  might  be  also  conceivably  used  to  stabilize  autonomous  walking 
robots  and  to  plan  their  activities. 


3  Accomplishments 

The  major  portion  of  our  investigative  effort  was  devoted  to  the  development  of  a  unified  framework  for  the 
description  of  approximate  reasoning  methods  that  facilitates  the  study  of  their  fundamental  characteristics. 
This  objective  was  attained  by  consideration  of  structures,  defined  in  spaces  of  possible  worlds  that  measure 
either  the  relative  size  of  certain  subsets  (for  probabilistic  methods)  or  the  similarity  between  possible  states 
(for  possibilistic  methods). 

Possible  worlds  are  formalizations  of  the  notion  of  possible  state  or  behavior  of  a  system  (e.g.,  the  possible, 
but  typically  unknown,  situation  in  a  battlefield,  possibly  encompassing  its  potential  modifications  in  time). 
Using  this  concept,  an  approximate  reasoning  problem  may  be  described  as  one  where  available  evidence 
(e.g.,  battlefield  intelligence)  is  insufficient  to  determine  if  the  actual  state  of  the  world  lies  among  those 
conceivable  possibilities  (i.e.,  possible  worlds),  where  a  statement  (‘hypothesis”)  about  the  system  is  true 
(e.g.,  whether  a  SAM  battery  is  currently  at  a  specific  location). 

The  major  contribution  of  the  research  performed  during  the  reporting  period  has  been  the  interpretation 
of  possibilistic  methods  in  terms  of  similarity  functions  between  possible  worlds.  The  formal  results  derived 
in  this  research,  which  are  summarized  in  the  paper  “The  Semantics  of  Vague  Knowledge,”  which  is  enclosed 
as  an  integral  part  of  this  report,  show  that  possibilistic  methods  are  substantially  different  in  nature  from 
their  probabilistic  counterparts.  Furthermore,  as  discussed  in  detail  in  that  work,  these  results  have  shown 
that  all  major  technologies  proposed  for  the  analysis  of  imprecise  information,  including  nonmonotonic  logic 
and  “qualitative  reasoning”  approaches,  may  be  easily  described  and  understood  in  terms  of  models  based 
on  possible  worlds. 

For  example,  probabilistic  methods  may  be  characterized  as  being  concerned  with  the  estimation  of 
measures  of  the  sets  of  possible  worlds  that  are  both  compatible  with  the  evidence  and  are  such  that  the 
hypothesis  is  true.  Since  any  proposition  is  equivalent  to  a  set  of  possible  worlds,  these  set  measures  are 
usually  estimated  by  the  past  frequency  of  truth  of  the  hypothesis  under  similar  circumstances.  Probabilistic 
assessments  describe  therefore  the  “tendency”  or  “propensity”  of  a  system  to  behave  in  certain  ways  (for 
example,  to  break  down  after  so  many  hours  of  operation).  Except  in  extreme  cases,  these  assessments  do 
not  assert  that  the  hypothesis  is  true  or  false  but  rather  that  there  is  a  likelihood  (expressed  numerically) 
or  chance  that  the  hypothesis  will  be  true. 

Possibilistic  methods,  on  the  other  hand,  are  concerned  with  the  identification  of  statements  that  are 
true  and  that  resemble,  in  some  respect,  the  hypothesis.  Their  bases  are  certain  measures  (metrics)  that 
describe  how  “similar”  or  “close”  are  pairs  of  possible  worlds  rather  than  to  measures  that  characterize  the 
“size”  of  subsets  of  possible  worlds.  These  metrics  formally  capture  the  notion  that  two  possible  states  of 
affairs  are  similar  in  that  certain  propositions  that  are  true  in  one  resemble  those  that  are  true  in  the  other 


2 


(e.g.,  “the  pressure  is  greater  than  100  lb/sq.in.”  and  “the  pressure  is  greater  than  110  lb./sq.in While 
a  probabilistic  statement  describes  tendency  towards  truth  (e.g.,  “the  probability  of  runway  destruction  is 
80%”),  the  possibilistic  answer  asserts  the  truth  of  a  related  proposition  (e.g.,  “the  runway  will  be  definitely 
inoperative  for  all  aircraft  of  type  A  or  type  B”). 

Contrary  to  the  opinions  held  by  some,  the  results  of  our  research  show  that  possibilistic  methods  are 
not  easily  interpreted  or  explained  by  probabilistic  structures.  Possibilistic  structures,  on  the  other  hand, 
have  been  shown  to  be  close  in  character  to  the  discretizations  used  in  “qualitative  reasoning,”  where  scalar 
variables  are  substituted  by  coarser  frameworks  that  replace  all  numbers  by  three  possible  values:  zero, 
negative,  and  positive.  The  possibilistic  schemes  generalize  this  idea  in  that  significant  groups  of  variable 
values  (or  “granules”)  may  be  arbitrarily  defined  and  in  that  these  granules  are  “fuzzy,”  in  the  sense  that 
whenever  the  value  of  the  variable  is  “close”  to  some  typical  value  in  the  granule,  results  applicable  to  the 
typical  value  may  be  “extrapolated”  to  the  actual  value. 

Furthermore,  our  research  indicates  that  it  is  also  improper  to  regard  probabilistic  and  possibilistic 
methods  as  competitive  technologies.  Since  their  aims  and  output  are  fundamentally  different,  the  proper 
attitude  is  to  regard  these  methodologies  as  complementary  tools  that  help,  in  different  ways,  in  assessing 
the  state  of  the  world. 

The  formal  model  leading  to  our  results  is  a  Kripke-type  semantic  model  with  the  customary  relation  of 
accessibility  replaced  by  multiple  relations  indexed  by  a  parameter  a.  Although  it  is  easier  to  think  of  this 
parameter  in  numerical  terms,  our  model  is  very  general  allowing  the  use  of  symbolic,  nonnumeric,  scales  to 
assess  resemblance.  Furthermore,  our  formulation  justifies  certain  formal  requirements  that  any  similarity 
measure  must  obey.  The  major  highlights  of  the  model  are  described  in  the  technical  note  “On  the  Semantics 
of  Fuzzy  Logic,”  which  is  enclosed  as  part  of  this  report.  These  developments  may  be  summarized  as  follows: 

•  Definition  of  multiple  accessibility  relations  by  a  similarity  function  that  defines  a  metric  in  a  space  of 
possible  worlds  (thus  allowing  use  of  “continuity”  arguments  to  “extrapolate”  results  from  one  world 
to  those  that  are  close  to  it) 

•  Generalization  of  the  modal  notion  of  possibility  to  a  graded  notion  of  possibility  that  is  related  to  the 
so-called  “de  re”  interpretation  of  conditional  statements  in  modal  logic. 

•  Characterization  of  similarities  as  being  defined  either  from  the  joint  viewpoint  of  several  variables  or 
descriptors  (joint  similarities),  or  being  limited  to  considerations  from  some  limited  respect  (marginal 
similarities). 

•  Identification  of  relationships  of  marginal  similarities  with  topological  and  metric  concepts  (mainly, 
the  so-called  “HausdorfF”  distance). 

•  Definition  of  unconditioned  and  conditional  possibility  functions  from  similarity  functions. 

•  Formal  justification  of  the  generalized  modus  ponens  of  Zadeh  as  an  extension  of  the  corresponding 
classical  inferential  rule.  This  central  result  generalizes  the  transitivity  of  set  inclusion  that  makes  the 
modus  ponens  valid  (i.e.,  if  A  is  a  subset  of  B  and  if  B  is  a  subset  of  C,  then  A  is  a  subset  of  C)  into  a 
relationship  between  the  sizes  of  the  “neighborhoods”  of  sets  that  include  each  other  (e.g.,  if  A  is  m  a 
neighborhood  of  size  a  of  B,  and  if  B  is  in  a  neighborhood  of  size  0  of  C,  then  A  is  in  a  neighborhood 
of  size  7  =  /(a,  /?)  of  C).  The  generalized  modus  ponens,  therefore,  combines  logical  principles  with 
the  properties  of  a  metric  relation  to  provide  a  sound,  correct,  form  of  logical  “extrapolation.” 

•  Characterization  of  the  problem  (important  in  practice)  of  derivation  of  similarity  functions  from 
possibility  functions. 

In  addition  to  our  basic  research  in  the  semantics  of  possibilistic  approaches,  w  have  continued  our 
research  into  the  definition  and  utilization  of  conditional  belief  measures  in  the  Demp  ^ter-Shafer  calculus 
of  evidence.  Applicable  formulas  are  currently  being  evaluated  on  the  basis  of  their  applicability  to  general 
cases  (in  general,  the  combination  of  conditioned  and  unconditioned  evidence  does  not  lead  to  functions  that 
are  compatible  with  the  axioms  of  the  evidential  calculus)  and  in  terms  of  the  computational  complexities 
of  the  algorithms  required  for  their  evaluation. 
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4  Status  and  Plan 


Our  basic  semantic  model  of  fuzzy  logic  is  complete.  Our  immediate  concern  is  the  evaluation  of  alternative 
formulations  that  rely  on  classes  of  similarity  functions  that  satisfy  certain  important  properties  (mainly 
assuring  that  the  value  of  the  similarity  between  two  objects  from  some  respect,  like  color,  be  always  higher 
than  the  value  of  the  similarity  between  those  objects  from  multiple  respects,  e.g.,  color  and  shape). 

Our  long  term  plans,  however,  focus  on  the  important  problem  of  deriving  similarity  values  from  possibil¬ 
ity  distributions.  In  our  model,  possibility  distributions  may  be  thought  of  as  similarities  from  some  respect 
(e.g.,  “pressure”)  that  measure  how  close  is  a  particular  situation  (e.g.,  “pressure  greater  than  50  lb./sq.in.) 
to  a  set  of  “typical  examples”  (e.g.,  “pressure  greater  than  100  lb./sq.in.”).  This  measure  of  object-to-set 
resemblance  defines  a  “linguistic  value”  (e.g.,  “very  high  pressure”)  that  may  be  used  as  the  basis  to  extrap¬ 
olate  from  statements  that  are  true  in  any  prototype  to  statements  that  are  true  in  the  particular  case  under 
consideration. 

The  role  of  similarities  in  our  formulation,  however,  is  primarily  conceptual;  intended  to  explain  a  complex 
notion  (i.e.,  possibility)  in  terms  of  a  more  primitive  concept  (i.e.,  similarity).  Although  our  formulas  permit 
the  computation  of  possibility  values  from  similarity  values,  similarities  (representing  proximity  from  the 
joint  viewpoint  of  several  respects)  will  be  derived,  in  practical  applications  such  as  similarity-driven  case- 
based  reasoning,  from  possibility  distributions  (characterizing  proximity  between  sets  of  objects  from  a 
limited  perspective).  For  this  reason,  it  is  our  intent  to  focus  future  attention  on  the  problems  associated 
with  the  derivation  of  similarity  functions  from  possibility  distributions.  Our  point  of  departure  is  existing 
work  linking  similarity  relations  with  certain  classes  of  subsets  of  possible  worlds.  The  derivation  of  specific 
formulas  must  await,  however,  the  evaluation  of  models  based  on  restricted  classes  of  similarity  functions 
characterized  both  by  desirable  theoretical  properties  (such  as  mentioned  above)  and  by  their  utility  in 
practical  applications  (primarily,  case-based  reasoning). 

In  addition,  we  plan  to  utilize  the  formulas  and  relations  derived  in  our  semantic  model  to  further  extend 
possibilistic  calculi  by  identification  of  relationships  between  distributions  that  may  be  used  to  compute 
some  of  them  as  a  function  of  others  (e.g.,  conditional  possibility  distributions  from  joint  and  marginal 
unconditional  distributions).  In  order  to  assess  the  applicability  and  efficiency  of  algorithms  based  on  such 
relations,  we  plan  to  develop  (in  collaboration  with  Dr.  Leonard  Wesley  of  the  Artificial  Intelligence  Center, 
SRI  International)  a  computational  environment  (ANALOG)  for  the  testing  of  similarity-based  analogical 
reasoning  procedures.  As  part  of  these  activities,  Dr.  Wesley  is  currently  engaged  in  the  collection  of  suitable 
databases  that  may  be  used  in  our  computational  experiments. 

5  Conference  Participation.  Publications 

1.  E.H.  Ruspini.  The  Semantics  of  Vague  Knowledge.  Presented  at  the  Second  International  Conference 
on  the  Processing  and  Management  of  Uncertainty  by  Expert  Systems,  Urbino,  Italy,  1988. 

2.  E.H.  Ruspini.  Generalized  Similarity  Relations  and  the  Semantics  of  Fuzzy  Logic.  Presented  at  the 
Workshop  on  Approximate  Reasoning  in  Expert  Systems,  Blanes,  Spain,  1989. 

3.  E.H.  Ruspini.  The  Semantics  of  Fuzzy  Logic.  Presented  at  the  Third  International  Fuzzy  Systems 
Associations  Conference,  Seattle,  Washington,  1989. 

4.  E.H.  Ruspini  participated  as  an  invited  discussant  in  the  Workshop  on  Nonstandard  Logics,  Roca- 
madour,  France,  1988.  His  discussion  of  papers  presented  by  panelists  presenting  position  papers  in 
approximate  reasoning  will  appear  in  a  volume  to  be  published  by  Academic  Press  in  1989. 

5.  E.H.  Ruspini  participated  as  a  reviewer  in  the  DRUMS/RP3  program  sponsored  by  the  European 
Economic  Community. 

6.  E.H.  Ruspini.  The  Semantics  of  Vague  Knowledge.  Revue  Internationale  de  Systdmique,  to  appear, 
1990. 
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7.  E.H.  Huspini.  On  the  Semantics  of  Fuzzy  Logic.  Technical  Note  No.  475,  SRI  International,  Menlo 
Park,  California,  November  1989. 

In  addition  the  principal  investigator  was  the  recipient  of  a  Fulbright  Fellowship  to  conduct  a  course  in 
Approximate  Reasoning  in  Spain  in  the  Spring  1989. 
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RESEARCH  PUBLICATIONS 


The  Semantics  of  Vague  Knowledge 

Enrique  H.  Ruspini* 

Artificial  Intelligence  Center 
SRI  International 
Menlo  Park,  California,  U.S.A. 


Abstract 

This  paper  is  devoted  to  the  discussion  of  basic  issues  related  to  the  meaning  of 
imprecise,  uncertain,  and  vague  knowledge,  its  manipulation,  and  its  utilization.  The 
informational  deficiencies  that  characterize  this  type  of  knowledge  are  described  in  terms 
of  the  impossibility  to  determine,  without  ambiguity,  the  truth  value  of  certain  hypothe¬ 
ses  —  i.e.,  statements  of  interest  to  those  seeking  to  understand  the  state  and  behavior 
of  a  real-world  system. 

Using  a  “possible  worlds”  perspective,  this  inability  may  also  be  characterized  by 
the  presence  of  conceivable  (i.e.,  consistent  with  evidence)  circumstances  where  the 
proposition  is  true,  and  of  equally  admissible  circumstances  where  it  is  false.  From  such 
a  viewpoint,  approximate  reasoning  techniques  are  presented  as  producers  of  correct 
descriptions  of  properties  of  the  class  of  possible  worlds  that  are  consistent  with  observed 
evidence,  rather  than  as  the  results  of  some  relaxation  of  the  notion  of  “truth-value." 

Two  major  classes  of  approximate  reasoning  systems  are  identified  —  probabilistic 
and  possibilistic  —  and  their  major  conceptual  differences  are  described.  The  theoretical 
underpinnings  of  each  methodological  approach  are  described,  and  the  current  level  of 
understanding  of  their  major  functional  structures  and  concepts  is  discussed. 

The  discussion  of  probabilistic  approaches  encompasses  both  subjectivist  and  ob- 
jectivist  perspectives,  and  also  includes  nonclassical  approaches  (such  as  the  Demp¬ 
ster/Shafer  calculus  of  evidence)  that  are  related  to  the  notion  of  interval  probabilities. 
The  discussion  of  possibilistic  approaches,  on  the  other  hand,  stresses  the  relations  be¬ 
tween  the  concepts  of  possibility  and  similarity  that  have  been  recently  studied  by  the 
author. 

Finally,  nonmonotonic  logic  and  qualitative  process  theory  concepts  are  briefly  ex¬ 
amined  from  the  perspective  of  possible-world  semantics. 
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contained  in  this  paper  are  those  of  the  author  and  should  not  be  interpreted  as  representative  of  the 
official  policies,  either  express  or  implied,  of  the  Air  Force  Office  of  Scientific  Research  or  the  United  States 
Government. 


1  Introduction 

This  paper  is  devoted  to  the  discussion  of  basic  issues  relevant  to  the  purpose  of  approxi¬ 
mate  reasoning  methodologies  with  emphasis  on  the  meaning  of  their  basic  structures  and 
concepts.  Approximate  reasoning  systems  may  be  briefly  characterized  as  automated  agents 
(e.g.,  computer  programs  and  systems)  that  seek  to  identify  the  state  of  a  real-world  system 
on  the  basis  of  knowledge  that  it  is  imprecise  —  i.e.,  available  information  does  not  possess 
the  desired  degree  of  detail  —  and  uncertain  —  i.e.,  we  are  not  absolutely  certain  about 
the  correctness  of  such  information. 

Under  these  conditions  it  is  possible,  usually  easily  so,  to  conceive  of  situations  where, 
given  available  information,  some  statement  about  the  real  world  is  true.  Under  other  con¬ 
ceivable  circumstances  —  equally  admissible  given  the  available  knowledge  —  that  state¬ 
ment  is  false.  In  a  majority  of  weather-forecasting  applications,  for  example,  the  information 
collected  by  a  variety  of  sensors  is  often  insufficient  to  determine  if  rain  will  fall  at  a  given 
location  at  a  given  future  time.  Depending  on  the  evolution  and  interaction  of  the  different 
components  and  subsystems  of  the  atmosphere,  rain  may  actually  fall  or  may  not  fall. 

The  importance  and  ubiquity  of  problems  characterized  by  information  that  is  imprecise 
and  uncertain  make  the  development  of  so-called  “approximate  reasoning”  systems  one  of 
the  most  important  technological  requirements  to  be  met  by  artificial  intelligence  proce¬ 
dures  that,  going  beyond  the  foundations  of  classical  deductive  techniques,  must  cope  with 
the  undesirable  features  of  the  underlying  knowledge.  The  current  lack  of  understanding  of 
the  principles  that  underlie  these  methodologies  combined  with  their  present  state  of  tech¬ 
nological  development  —  often  exemplified  by  the  use  of  questionable  “ad  hoc”  methods  — 
has  led  to  considerable  controversy  among  practitioners  who  have,  in  recent  years,  debated 
their  relative  advantages  and  disadvantages. 

The  absence  of  a  formal  unified  framework  for  the  description  of  the  underlying  concepts 
and  structures  of  various  applicable  technologies  has  complicated  their  understanding  and 
comparison,  making  it  nearly  impossible  to  develop  even  a  partial  consensus  about  the 
relative  applicability  of  each  methodology.  Lacking  formal  structures  to  guide,  in  a  rigorous 
fashion,  the  use  of  terms  such  as  “probability”  and  “possibility,”  each  capable  of  being 
interpreted  in  a  variety  of  ways,  it  is  nearly  impossible  to  evaluate  arguments  advanced  for 
or  against  particular  positions.  Furthermore,  problems  such  as  the  determination  of  the 
validity  of  the  output  of  approximate  reasoning  systems,  or  of  their  usefulness  in  specific 
circumstances  (or  even  establishing  the  meaning  of  such  notions),  have  remained  largely 
unaddress  d. 

This  paper  reports  on  the  results  of  research  toward  the  development  of  firm  founda¬ 
tions  for  the  unified  description  of  approximate  reasoning  methods,  with  emphasis  on  the 
interpretation  of  their  underlying  concepts  and  structures.  The  formal  framework  derived  in 
this  research  is  based  on  the  notion  of  “possible  worlds”  as  introduced  in  modal  logics  [15]. 
In  this  paper,  our  attention  will  be  mainly  focused  on  various  types  of  probabilistic,  dis¬ 
cussed  in  Section  2,  and  possibilistic  reasoning  methods,  presented  in  Section  3.  Included  is 
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a  discussion  of  relations  with  qualitative  and  nonmonotonic  reasoning  methods,  which  are 
also  concerned  with  problems  associated  with  imprecise  and  uncertain  information.  Before 
presenting  such  issues,  it  is  important  to  consider  the  general  nature  of  the  approximate 
reasoning  problem. 

1.1  The  Nature  of  Approximate  Reasoning 

The  goal  of  any  system  that  relies  on  inference  techniques  is  to  assign  a  truth  value ,  which 
may  be  either  true  or  false,  to  statements  —  called  hypotheses  —  about  the  state  or 
behavior  of  a  real  world  system.  Due  to  its  very  nature,  however,  the  approximate  reasoning 
problem  is  unsolvable,  because  of  either  fundamental  or  practical  limitations. 

Available  information  is  often  insufficient  to  determine,  by  means  of  conventional  infer¬ 
ence  procedures,  if  a  hypothesis  is  true  or  false.  In  some  problems,  the  impossibility  is  of 
a  more  practical  nature:  there  are  not  enough  resources  (e.g.,  memory,  computer  time)  to 
determine  if  the  hypothesis  is  true  or  not. 

Whether  the  impossibility  is  fundamental  or  practical,  the  important  fact  is  that,  as 
posed,  an  approximate  reasoning  problem  is  not  solvable.  Information  constrains  the  possi¬ 
ble  truth  values  of  hypotheses  but  rarely  restricts  them  to  unique  values.  In  general,  those 
constraints  determine  a  set  of  possible  solutions.  Each  such  solution  is  an  assignment  of 
truth  values  that  is  logically  consistent  with  observed  facts  and  system  knowledge  (typi¬ 
cally  expressing  laws  of  system  behavior).  For  example,  an  observation,  made  several  days 
earlier  about  the  location  of  an  automobile  on  a  highway,  augmented  by  knowledge  about 
the  capability  of  such  a  vehicle  to  proceed  at  certain  speeds  through  some  roads,  may  be 
sufficient  to  determine  a  set  of  its  possible  current  locations,  but  it  will  usually  be  unable 
to  pinpoint  any  one  of  them  as  the  only  possible  place  where  the  vehicle  could  be  at  the 
present  time. 

The  solution  of  an  approximate  reasoning  problem  is  therefore  a  set  of  possibilities 1 
that  are  logically  consistent  with  available  information.  In  this  document  we  use  the  term 
possible  worlds ,  which  is  borrowed  from  logic  (specifically  modal  logic),  to  denote  each  such 
possibility  [4]. 

In  most  approximate  reasoning  problems  it  is  not  practically  possible  to  describe  a 
set  of  possible  worlds  to  an  acceptable  level  of  detail.  Different  methodologies  have  been 
developed,  however,  to  describe  some  properties  of  the  set  of  possible  solutions  or,  more 
generally,  certain  constraints  on  values  that  measure  such  properties.  For  example,  proba¬ 
bilistic  methods  seek  to  identify  the  probability  distribution  of  some  of  the  variables  that  are 
used  to  characterize  each  possible  world.  As  we  will  see,  often  even  this  level  of  detail  may 
not  be  attained,  and  the  best  we  can  do  is  to  indicate  that  certain  probability  distribution 
values  are  possible  while  others  are  not  (e.g.,  the  probability  of  rain  will  be  between  60% 
and  80%). 

1  Note  that  this  use  of  the  term  possibility  is  different  from  that  used  below  in  connection  with  possibilistic 
reasoning. 
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1.2  Possible  Worlds 


Possible  worlds,  as  informally  described  above,  are  the  solutions  of  an  approximate  reasoning 
problem  that  are  consistent  with  existing  information  and  knowledge.  In  many  problems, 
each  of  these  solutions  corresponds  to  the  state  of  a  real- world  system  at  a  given  instant  in 
time.  In  other  examples,  each  possible  world  may  also  include  descriptions  of  past,  present, 
and  future  (predicted)  states  of  the  real  world.  In  some  planning  and  control  problems  (e.g., 
autonomous  robot  path  and  activity  planning),  each  possible  world  may  correspond  to  a 
description  of  the  characteristics  of  a  plan  formulated  by  rational  agents  seeking  to  control 
certain  aspects  of  system  behavior  together  with  its  resulting  effects  on  the  planned  system 
and  its  environment. 

The  characteristics  and  complexity  of  each  possible  solution  are,  therefore,  highly  de¬ 
pendent  on  the  particular  real-world  system  being  studied  and  the  analytical  requirements 
of  the  users  of  the  approximate  reasoning  system.  Although,  as  we  have  just  seen,  this 
diversity  of  needs  leads  to  widely  different  types  of  possible  worlds,  there  exists  a  high-level, 
logical  characterization  of  the  concept  of  possible  world  in  terms  of  the  possible  truth  of 
statements  (propositions)  about  the  real-world  system  being  studied.  This  characterization 
was  derived  by  Carnap  [5],  who  also  proposed  a  conceptual  procedure  for  the  generation  of 
descriptions  of  all  possible  states  of  affairs. 

While  Carnap  considered  first-order-logic  systems  in  his  characterization  of  the  con¬ 
cept,  we  shall  confine  ourselves  to  a  simpler,  proposition-based  description  that  captures 
the  essence  of  his  construction  procedure.  Before  proceeding  to  its  discussion  it  is  very 
important  to  remark,  however,  that  the  Carnap  procedure  is  a  conceptual  process  intended 
primarily  to  formalize  the  notion  of  possible  world  while  providing  clear  foundations  for 
the  discussion  of  other  concepts  (e.g.,  possible  truth).  The  combinatorial  explosion  associ¬ 
ated  with  Carnap’s  process  makes  unfeasible  the  actual  enumeration  and  representation  of 
possible-world  spaces  in  real-life  problems. 

The  procedure  of  Carnap  starts  with  consideration  of  a  finite  number  of  ground  propo¬ 
sitions 

Pi  1  P2  -  •  -  •  1  Pm 

that  describe  characteristics  of  a  real-world  system.  For  example,  in  a  weather-forecasting 
application,  these  propositions  may  include  declarative  knowledge  statements  such  as:  “The 
total  rainfall  will  be  less  than  1  cm.”  These  statements  are  intended  to  capture  those  aspects 
of  the  behavior  of  the  world  that  are  important  to  analysts  and  to  identify  that  behavior 
to  the  necessary  degree  of  precision. 

After  these  propositions  have  been  identified,  the  process  proceeds  to  consider  all  the 
conjunctions  of  the  type2 

pi  A  P2  A  ~>p3  A  ...  A  pm  , 

throughout  this  paper  we  use  the  conjunction  symbol  A  to  mean  “and,”  the  disjunction  symbol  V  to 
mean  “or,”  and  the  negation  symbol  ->  to  mean  “not.” 
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where  each  of  the  ground  propositions  appears  once  either  as  given  or  negated.  If  m  ground 
propositions  had  been  identified,  this  process  leads  to  2m  conjunctions.  We  eliminate  from 
this  set  conjunctions  that  represent  logical  impossibilities  like,  for  example:  “the  total 
rainfall  will  be  less  than  1  cm  and  the  total  rainfall  will  be  more  than  3  cm,”  and  those 
that  are  logically  inconsistent  with  several  prespecified  propositions  —  axioms  about  the 
behavior  of  the  system  being  studied  —  aj ,  02,  ■  ■  . ,  a/,  that  are  always  assumed  to  be  true. 

The  remaining  members  of  this  propositional  set,  or  Camapian  Universe ,  are  called 
possible  worlds.  Each  possible  world  is  a  description  (to  the  maximum  level  of  detail  allowed 
by  our  original  set  of  ground  propositions)  of  a  possible,  although  typically  unknown,  state 
of  the  system  under  study.  Each  such  description  is  consistent  both  with  the  laws  of  logic 
and  with  the  axioms  that  constrain  system  behavior  and  may  be  thought  of  as  a  function 
(called  a  valuation)  that  assigns  to  each  relevant  proposition  a  truth- value  that  is  either 
“true”  or  “false.”  Similarly,  possible  worlds  may  be  thought  of  as  sets  of  propositions  that 
contain  all  propositions  that  are  true  and  the  negation  of  those  that  are  false,  as  illustrated  in 
Figure  1  where  each  possible  world  is  revealed,  through  the  help  of  a  hypothetical  “logical” 


Figure  1:  The  Carnapian  Universe. 

microscope  as  a  collection  of  true  propositions.  Furthermore,  each  possible  world  differs 
from  any  other  in  that  at  least  one  proposition  that  is  true  in  one  world  is  false  in  the  other. 

From  this  logical  perspective,  which  is  particularly  useful  in  artificial  intelligence  appli¬ 
cations,  the  observations  in  a  body  of  evidence,  which  correspond  to  the  truth  of  certain 
propositions,  may  be  thought  of  as  constraints  on  the  subsets  of  possible  worlds  where  the 
state  of  the  real-world  system  actually  lies.  Possible  worlds  that  are  logically  consistent 
with  those  propositions  (said  to  be  compatible  with  the  evidence)  are,  generally,  a  proper 
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subset  of  the  Carnapian  universe  of  possibilities. 

It  is  generally  agreed  that  “stronger”  or  “better”  evidence  results  in  subsets  of  possible 
worlds  that  are  smaller,  in  some  sense,  than  “weak”  evidence.  The  quality  of  evidence, 
however,  should  be  judged  from  a  variety  of  standards.  Among  those,  domain-dependent 
criteria  are  usually  the  most  important  in  assessing  the  quality  of  informational  bodies.  In 
general,  it  is  desirable  that  the  evidence  be  such  as  to  allow  unambiguous  answers  to  certain 
questions  of  importance  (i.e.,  hypotheses).  To  rephrase  this  statement  with  the  help  of  the 
Carnapian  characterization,  it  is  desirable  that  the  evidence  be  such  that  propositions  of 
importance  be  true  (or  false)  for  every  possible  world  compatible  with  the  evidence,  rather 
than  true  for  some  and  false  for  others. 

As  we  have  stressed  before,  however,  an  approximate  reasoning  problem  is  such  that 
the  evidence  is  incapable  of  determining  whether  a  hypothesis  is  true  or  false,  as  illustrated 
in  Figure  2.  Approximate  reasoning  systems  are  concerned  with  the  description  of  certain 
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|  |  Worlds  consistent  with  the  evidence  (  £)|  | 

_ 

Ml— 

§Jj  Worlds  logically  inconsistent  with  the  evidence  fvj 

HYPOTHESIS  TRUE 

HYPOTHESIS  FALSE 

Figure  2:  The  Approximate  Reasoning  Problem. 

properties  of  the  set  £  of  possible  worlds  that  are  consistent  with  the  evidence,  seeking 
primarily  to  characterize  the  subsets  9f  H  T,  and  C\  E  of  worlds  compatible  with  the 
evidence  where  a  hypothesis  is  either  true  or  false,  respectively.  The  descriptions  that  they 
provide,  however,  are  of  a  substantially  different  nature  for  different  approaches  —  not 
being  all  based  or  explained,  as  often  erroneously  claimed,  by  probabilistic  notions. 

1.3  Probabilistic  and  Possibilistic  Reasoning 

In  this  paper  we  will  be  concerned  primarily  with  the  two  major  types  of  approximate 
reasoning  methodologies  that  are  being  actively  used  to  treat  practical  situation-assessment 
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and  planning/decision  problems.  These  methodologies  are  commonly  said  to  be  probabilistic 
or  possibilistic ,  respectively. 

Probabilistic  methods  seek  to  describe  the  structure  of  a  set  of  possible  worlds  by  means 
of  certain  conditional  probability  distributions  (the  condition  being  the  actual  evidence  at 
hand).  If  these  distributions  are  considered  to  represent  the  tendency  or  propensity  of  the 
world  to  act  in  a  repetitive  fashion  that  may  be  described  by  a  frequency  of  occurrence, 
they  are  said  to  have  an  objectivist  interpretation;  if  they  represent,  on  the  other  hand,  the 
degrees  of  belief  (or  of  commitment  to  certain  courses  of  action)  of  certain  rational  agents, 
then  they  are  said  to  have  a  subjectivist  interpretation. 

Irrespectively  of  the  particular  interpretation  used,  probabilistic  reasoning  methods  are 
concerned  with  the  likelihood  (either  measured  by  previous  experience  or  believed  by  an 
agent)  that  a  particular  hypothesis  will  be  true  in  a  given  situation.  Save  for  exceptional 
cases  (i.e.,  probabilities  equal  to  0  or  1),  no  firm  assurances  are  given  to  the  user  of  any 
probabilistic  methodology  about  the  actual  state  of  the  world  or  its  behavior.  The  proba¬ 
bilistic  assessment  is  one  of  tendency  and  is  primarily  useful  in  the  “long  run,”  that  is,  when 
evaluated  by  criteria  that  take  into  account  the  aggregate  performance  of  the  approximate 
reasoner  over  many  situation-assessment  and  decision-aid  examples. 

Probabilistic  results  are  particularly  useful  in  organizations  such  as  insurance  companies 
or  gambling  houses,  where  success  is  evaluated  in  terms  of  a  population  of  examples  (i.e., 
all  insurance  policies  or  all  gambling  customers).  By  this  statement  we  do  not  mean  that 
probabilistic  information  is  useless  for  single  cases  or  “short  runs.”3  Our  point  is  that, 
for  all  we  know,  the  hypothesis  may  be  true  or  may  be  false  (that  is  the  nature  of  the 
approximate  reasoning  problem).  Under  such  circumstances,  decisions  that  could  possibly 
lead  to  an  undesirable  state  of  affairs  may  deserve  to  be  analyzed  from  other  viewpoints. 

Possibilistic  reasoning,  on  the  other  hand,  seeks  to  describe  possible  worlds  in  terms  of 
their  similarity  to  other  sets  of  possible  worlds  by  placing  emphasis  on  assessments  that  may 
b,  assured  to  be  valid  in  each  particular  case  and  situation.  Rather  than  describing  relative 
proportions  (of  occurrence)  of  possible  worlds  where  a  hypothesis  of  interest  is  true  or  false, 
as  done  by  probabilistic  methods,  possibilistic  reasoning  seeks  to  describe  all  possible  worlds 
that  are  compatible  with  evidence,  in  terms  of  their  resemblance  to  members  of  certain  sets 
of  “exemplary”  or  “typical”  worlds. 

For  example,  a  probabilistic  method  may  determine  that  a  corporation  has  a  probability 
of  80%  of  exceeding  its  profit  goal  for  the  year.  This  assessment  is  not  an  assurance  that  such 
a  goal  will  be  attained.  It  does  provide,  however,  some  basis  for  subsequent  management 
policy.  While  there  is  a  chance  that  profits  will  fall  short  of  the  goal,  if  management 
policy  be  consistently  applied  in  every  fiscal  period,  then,  in  the  long  run,  proper  rational 
decisions  would  have  been  made  and  the  company  could  be  expected  to  prosper  (despite 
possible  occasional  setbacks).  A  possibilistic  method,  on  the  other  hand,  may  assert  that 
profits  will  amount  to  at  least  70%  of  the  goal  figure.  On  some  previously  agreed  similarity 

3Our  view,  that  decisions  that  are  best  in  the  “long  run”  may  not  be  the  same  as  those  that  are  best  in 
single  instances,  does  not  agree  with  current  subjectivist  orthodoxy. 
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scale  such  a  statement  may  be  translated  into  the  possibilistic  statement:  “the  possibility 
of  achieving  the  profit-goal  is  0.7.”  Note  that  the  emphasis  is  on  certainty  and  comparison 
between  statements  rather  than  on  likelihood  and  chance. 

In  general,  possibilistic  methods,  which  are  strongly  rooted  on  fuzzy  set  theory  [41], 
provide  assessments  such  as  “the  profit  will  be  adequate,”  indicating  that  the  predicted 
value  of  the  profit  will  have  a  similarity  greater  than  zero  (sometimes  possibilistic  techniques 
produce  specific  lower  bounds)  to  a  value  that  is  a  good  example  of  “adequate  gain.”  Often 
it  is  also  said  that  these  vague  statements  describe  the  degree  of  ease  by  which  the  concept 
“adequate”  matches  the  situation  at  hand.  The  ability  to  represent  vague  concepts  by 
possibility  distributions  —  attained  by  indicating  that  a  value  of  a  variable  matches  the 
vague  concept  to  a  degree  —  is  central  to  fuzzy  set  theory,  which  was  conceived  as  a  basis 
for  the  formed  treatment  of  linguistic  utterances  as  they  are  commonly  found  in  everyday 
discourse. 

In  summary,  we  may  say  that  the  approach  to  the  analysis  of  imprecise  and  uncertain 
information  that  is  used  by  any  approximate  reasoning  methodology  is  based  on  the  solution 
of  a  problem  that  is  related  to  but  different  from,  the  unsolvable  problem  of  determining, 
without  ambiguity,  the  truth  of  a  hypothesis.  In  the  probabilistic  case,  the  answers  provided 
consist  of  estimates  of  frequency  of  the  truth  of  the  hypothesis  in  similar  cases  as  determined 
by  prior  observation  (objectivist  interpretation)  or  degree  of  commitment  in  a  gamble  based 
on  the  actual  truth  of  the  hypothesis  (subjectivist  interpretation).  In  the  possibilistic  case, 
in  contrast,  the  answers  provided  assert  that  a  related,  similar,  hypothesis  is  true. 

2  Probabilistic  Reasoning 

Probabilistic  reasoning  methods  focus  on  the  description  of  the  relative  proportions  of  the 
occurrence  of  truth  or  falsehood  of  certain  hypotheses  under  certain  evidential  constraints. 
These  constraints,  representing  available  evidence  *E,  conditions  the  probabilities  P(A  = 
z|£)  describing  the  frequency  of  occurrence  of  the  value  x  of  the  state  variable  A'  when  T. 
is  true.  Using  again  the  Carnapian  characterization,  we  may  describe  these  techniques  as 
being  concerned  with  the  determination  of  the  probability  of  some  subsets  of  the  Carnapian 
universe  on  the  basis  of  the  probability  of  related  subsets. 

If  possible  worlds  in  the  Carnapian  universe  correspond  to  individual  combinations  o'" 
the  values  of  n  state  variables  X\,  X2,  ■  • . ,  A„,  that  is, 

Pa  =  (A'l  =  Xi)  A  (X2  =  X2)  A  .  A  (A„  =  xn ) 

then,  in  general,  probabilistic  reasoning  problems  require  the  determination  of  either  the 
joint  probability  distribution 

P(X  1  =  xi,X2  =  x2,. ..  ,  Xn  =  x„|!E) 

or,  alternatively,  one  of  its  marginal  distributions  on  the  bases  of  information  consisting  of 
related  marginal  and  conditional  probability  distributions 
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2.1  Conventional  Probabilistic  Reasoning 

Classical  probabilistic  techniques  rely  on  a  calculus  that  is  directly  derived  from  the  axioms 
of  probability  theory  and  that,  in  addition,  assumes  that  all  required  numerical  probability 
values  are  available,  either  as  the  result  of  prior  empirical  observation  (i.e.,  frequencies  of 
occurrence)  or  as  the  result  of  elicitation  of  personal  commitment  to  gambling  outcomes 
( “degrees  of  belief’ ) . 

The  rules  used  for  this  derivation  include  the  additivity  axiom  of  probability 

P(A)  +  P(B)  =  P(A  HB)  +  P(A  U  B) , 


and  the  celebrated  identity  of  Bayes-Laplace 


P(B\A)  = 


P(A\B)P(B) 

P[A) 


which  is  a  direct  consequence  of  the  definition  of  conditional  probability. 

The  bane  of  all  methods  relying  on  the  use  of  classical  probability  procedures  is  the  lack 
of  sufficient  information  about  the  required  values  of  conditional  and  marginal  (a  priori) 
probabilities.  Even  when  assumptions  of  independence  between  variable  values,  i.e., 


P((X  =  *)  A  (Y  =  y))  =  P(X  =  x)  P(Y  =  y) , 


and  conditional  independence  between  variable  values,  i.e., 


P(X  =  x\Y  =  y,Z  =  z)  =  P(X  =  x\Y  =  y) , 

are  used  to  simplify  the  required  computations  [27],  the  number  of  variables  involved  in  a 
typical  approximate  reasoning  problem  lead  to  the  need  to  estimate  a  large  number  (usually 
exponentially  related  to  the  number  of  variables)  of  marginal  and  conditional  probability 
distributions. 

The  difficulties  inherent  in  such  estimation  required  early  efforts,  such  as  the  develop¬ 
ment  of  PROSPECTOR  [9],  to  use  a  combination  of  probabilistic  procedures  in  combination 
with  ad  hoc  or  heuristic  techniques  to  overcome  problems  associated  with  lack  of  proba¬ 
bilistic  information  and  to  resolve  some  inconsistencies  that  occurred  whenever  estimated 
information  overconstrained  some  probability  distributions. 

Some  of  these  methodological  problems  can  also  be  traced  to  the  desire  to  generalize 
the  network- based,  goal-oriented  procedures  of  classical  expert  systems  to  situations  where 
the  traditional  truth  values  of  classical  logic  (i.e.,  true  and  false)  were  generalized  to  a 
continuous  scale  by  equating  truth-value  with  probability.  The  difficulties  involved  in  such 
a  generalization  were  soon  apparent,  as,  for  example,  the  transitivity  of  implication  valid 
in  conventional  inference,  that  is, 

If  X  implies  Y,  and  if  Y  implies  Z,  then  X  implies  Z 
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fails  to  hold  for  probabilities;  that  is,  P(Y\X)  may  be  high,  P(Z\Y)  may  be  high,  but 
P(Z\X)  may  be  zero.  Current  methodologies  based  on  the  use  of  classical  probability 
theory  to  compute  the  values  of  a  joint  probability  distribution  [22,25]  have  solved  these 
methodological  problems  but,  in  spite  of  their  deft  exploitation  of  independence  assumptions 
in  probabilistic  networks  [27],  they  still  face  the  combinatorial  explosion  difficulties  that  are 
typical  of  multivariable  problems. 

2.2  The  Estimation  of  Probability  Distributions 

If  a  purely  objectivist  viewpoint  is  taken,  it  is  clear  that  the  probability  distributions  re¬ 
quired  to  determine  the  probability  of  a  hypothesis  given  availaole  evidence  may  not  be 
available.  In  this  view,  which  we  hold,  probability  can  only  be  the  result  of  experience  ac¬ 
cumulated  through  previous  observation,  and  while,  theoretically,  absent  values  may  deriv¬ 
able  by  empirical  means,  it  is  often  the  case  that  the  required  experiments  are  unfeasible 
or  impractical.  This  is  particularly  true  in  problems  involving  systems  that  are  not  easy 
to  manipulate  or  observe  (e.g.,  evaluation  of  building  damage  due  to  earthquakes)  or  when 
the  required  information  is  actively  denied  or  obscured  by  adversaries  (e.g.,  in  military 
situation- assessment  problems). 

The  orthodox  subjectivist  view  of  probability  claims,  on  the  other  hand,  that  it  is 
impossible  to  ignore  the  values  of  probability  distributions,  as  they  are  always  statements 
of  the  degree  of  belief  that  certain  agents  have  about  the  truth  of  hypotheses.  The  rationale 
supporting  the  representation  of  such  beliefs  by  numerical  functions  having  the  properties 
of  a  probability  function  is  based  on  the  famous  “dutch  book”  argument  [6]  If  an  agent 
is  to  engage  in  a  gamble  involving  the  truth  or  falsehood  of  a  certain  hypothesis,  it  will 
be  irrational  for  him  to  choose  a  combination  of  bets  where  he  will  be  sure  to  lose  (a 
dutch-book)  regardless  of  the  outcome  of  the  gamble  turns.  Under  such  conditions,  it  can 
be  shown  that  his  personal  beliefs  (assumed  to  be  numbers)  on  truth  and  falsehood  of 
hypotheses  must  satisfy  the  axioms  of  probability. 

Other  personalistic  axiomatic  systems  have  also  been  proposed  to  support  the  contention 
that  personal  beliefs  on  hypothetical  truth  can  always  be  estimated  using  a  single  numerical 
value [33].  These  axiomatic  systems  have,  however,  been  subject  to  considerable  criticism 
both  on  the  basis  of  their  naturality  or  rationality  [37,21]  and  on  the  basis  of  observation  of 
the  actual  behavior  of  rational  agents  under  controlled  circumstances  [2,10] 

Perhaps  more  controversial  is  the  so-called  “pragmatic  necessity”  argument  proposed  by 
some  decision  scientists  to  justify  their  choice  of  probability  values  in  the  absence  of  relevant 
knowledge.  The  essential  point  of  this  argument  emphasizes  the  decision-oriented  nature  of 
most  approximate  reasoning  problems.  It  is  said  that  if  a  decision  must  be  made,  when  all 
empirical  information  has  been  considered,  then  any  missing  probability  values  (consistent 
with  such  knowledge)  may  be  chosen  because  something,  after  all,  must  be  done.  While 
not  claiming  that  this  procedure  replaces  objectively  determined  probability  values,  it  is 


9 


said  that  ignorance  of  such  quantities  is  inconsequential.4  Such  light  dismissal  of  required 
[  probability  values  may  have,  of  course,  significant  undesirable  consequences. 

'  Metaphysical  principles,  such  as  the  principle  of  insufficient  reason  or  the  maximum 

I  entropy  principle ,  that  seek  to  formalize  the  choice  of  single  distributions  on  purportedly 

“rational”  bases  other  than  empirical  knowledge  are  vulnerable  to  the  same  criticism.  Re¬ 
gardless  of  whatever  claims  some  may  make  invoking  pragmatic  needs  or  metaphysics  to 

I  develop  AI  tools  to  assess  complex  situations,  scientific  practice  —  fundamentally  inter¬ 

ested  in  understanding  the  world  and  interacting  with  it  —  eschews  these  practices,  relying 
instead  on  experiment-based,  hypothesis-testing  paradigms. 

When  it  is  accepted,  at  least,  that  sometimes  probability  values  may  not  be  either 
observable  or  capable  of  being  elicited,  it  is  clear  that  probabilistic  reasoning  techniques 
must  proceed  beyond  classical  probability  calculus  and  develop  alternative  computation 
I  schemes  that  do  not  assume  such  informational  availability.  This  generalization  does  not 

1  require,  as  it  is  claimed  by  some,  to  abandon  either  the  axioms  of  probability  or  Bayes’ 

(rule  as  essential  elements  of  the  underlying  calculus.  Instead,  we  are  simply  extending 
our  computational  —  rather  than  our  conceptual  schemes  to  determine  the  effects  of  our 
ignorance  on  the  results  of  probabilistic  analyses. 

^  3  Generalized  Probabilistic  Reasoning 

I  Current  approaches  that  generalize  the  calculus  of  probabilities  are,  as  stated  above,  based 

on  generalization  of  computational  rather  than  conceptual  schemes.  As  such,  the  qualifier 
“non-Bayesian”  that  is  sometimes  associated  with  them,  is  basically  incorrect;  its  validity 
is  limited  to  the  current  skepticism,  among  orthodox  subjectivists  (often  called  Bayesians), 
about  their  necessity.  All  of  these  schemes  are  based  on  variations  of  the  same  idea:  the 
determination  of  intervals  [36]  where  unknown  probability  values  must  lie. 

3.1  Interval- valued  Probabilities 

General  formalisms  for  the  representation  and  manipulation  of  interval  probability  bounds 
have  been  investigated  by  Kyburg[20],  who  also  studied  issues  germane  to  the  relations 

I  between  this  general  formulation  and  the  calculus  of  evidence  of  Dempster-Shafer  [19]  The 

central  notion  in  his  treatment  of  probabilistic  knowledge  is  that  of  “convex  probabilities” 
used  to  describe  the  set  of  probability  values  in  multidimensional  space  where  possible  values 
of  the  underlying  distributions  lie. 

Although  general  interval-valued  probability  is  preferable  to  other  schemes,  which  are 
.  limited  by  their  theoretical  representation  capabilities,  the  corresponding  calculus  of  inter¬ 

vals  is  hampered  by  the  difficulties  associated  with  the  storage  and  processing  of  a  large 

*  It  is  important  to  point  out,  however,  that  many  decision  scientists  rely,  under  these  circumstances,  on 
analyses  of  the  sensitivity  of  their  results  to  such  convenient  assumptions. 

I 
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number  of  probability  bounds.  If  m  ground  propositions  are  identified  as  the  initial  gen¬ 
erators  of  a  Carnapian  universe,  it  may  be  necessary  to  store  and  manipulate  22m  bounds 
corresponding  to  all  subsets  of  this  universe.  These  difficulties  have  effectively  limited  the 
application  of  interval-based  approaches  in  practice. 

Practical  schemes  that  are  amenable  to  computer-based  implementation,  on  the  other 
hand,  do  not  have  the  same  generality.  In  general,  these  approaches  rely  on  manipulation 
of  intervals  that  have  been  generated  by  knowledge  of  probability  values  for  some  subsets 
that  are  then  used  to  determine  interval  bounds  for  the  probabilities  of  subsets  of  interest 
(i.e.,  inner  or  lower  probabilities).  Among  such  schemes  relying  on  the  use  of  lower  proba¬ 
bilities,  the  calculus  of  evidence  of  Dempster-Shafer  has  found  the  largest  acceptance  in  the 
approximate  reasoning  community. 

3.2  Evidential  Reasoning 

Evidential  reasoning  is  the  name  of  the  methodology  based  on  the  Dempster-Shafer  calculus 
of  evidence.5  The  basic  structures  of  the  calculus  of  evidence  were  introduced  by  Demp¬ 
ster  in  1966  [7].  Shafer  [34]  proposed  in  1976  the  use  of  those  constructs  to  represent  and 
manipulate  evidence.  The  methodology  was  first  applied  to  the  solution  of  approximate  rea¬ 
soning  problems  in  artificial  intelligence  at  SRI  International  [12,23].  Although  the  calculus 
of  evidence  is  often  regarded  as  being  non-Bayesian  (meaning  primarily  nonprobabilistic), 
its  original  derivation  by  Dempster  is  fully  consistent  with  conventional  probability  theory. 
Recent  results  by  Ruspini  [30,31]  have  further  supported  this  contention. 

Evidential  reasoning  is  based  on  the  representation  of  probabilistic  evidence  by  means  of 
mass  functions  or  basic  probability  assignments.  Mass  functions  assign  a  nonnegative  mass 
value  to  every  subset  in  a  space  of  possible  solutions  (or  possible  worlds).  The  sum  of  all 
these  mass  assignments  over  the  set  of  all  such  subsets  (called  the  power  set)  is  always  1. 

Evidential  reasoning  is  advantageous  in  that  it  allows  representation  of  the  degree  of 
support  provided  by  evidence  toward  the  truth  of  a  hypothesis  without  requiring  that  such 
support  be  split  among  more  specific  propositions  implying  that  hypothesis.  For  example,  in 
a  criminal  investigation  case,  evidence  may  indicate  that  the  perpetrator  is  blonde  without 
actually  identifying  his  or  her  identity.  In  such  a  case,  a  mass  function  that  assigns  a  mass 
of  1  to  the  set  of  all  blonde  suspects  and  0  to  all  other  subsets  is  used  to  represent  the 
evidential  weight.  Note  that  in  this  case  the  sum  of  the  masses  for  all  sets  consisting  of  a 
single  blonde  suspect  (0)  is  different  from  the  mass  assigned  to  the  set  of  all  blonde  suspects 
(1).  Had  masses  corresponded  to  actual  probabilities  of  guilt,  those  two  quantities  should 
have  been  the  same.6 

Closely  associated  with  the  notion  of  mass  are  the  belief  and  plausibility  functions  defined 

JThe  reader  must  be  warned  about  a  recent  tendency  in  the  literature  to  use  the  expression  “evidential 
reasoning”  as  a  synonym  of  “approximate  reasoning.” 

6 For  this  and  other  reasons  it  has  been  claimed  that  evidential  reasoning  is  non-Bayesian  or  nonproba¬ 
bilistic.  As  we  will  see  below,  this  assessment  is  based  on  incorrect  interpretation  of  the  meaning  of  mass 
functions. 
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by 


Be\(A)=  £  m(B), 

BCA 

and 

P\{A)=  £  m(A) . 

BnA^Q) 

The  belief  function  is  a  measure  of  the  total  support  provided  by  evidence  toward  the  truth 
of  a  particular  proposition,  while  the  plausibility  function  measures  the  degree  by  which 
the  evidence  fails  to  refute  it. 

3.2.1  Logical  Bases  for  Evidential  Reasoning 

Our  possible-worlds  approach  to  the  description  of  probabilistic  reasoning  may  be  extended 
to  develop  a  formal  foundation  for  the  basic  functions  and  structures  of  evidential  reasoning. 
This  extension  is  based  on  the  use  of  a  form  of  modal  logic,  called  epistemic  logic ,  introduced 
to  deal  with  issues  that  are  relevant  to  the  states  of  knowledge  of  rational  agents.  The  insight 
provided  by  this  characterization  has  helped  to  clarify  a  number  of  fundamental  issues  in 
evidential  reasoning,  notably  in  the  areas  of  semantic  characterization  of  the  notion  of 
evidential  independence  and  in  the  derivation  of  schemes  for  the  combination  of  dependent 
and  conditional  evidence. 

Epistemic  logic  is,  like  conventional  Boolean  logic,  a  two- valued  logic  where  each  propo¬ 
sition  is  assigned  one  and  only  one  of  the  classical  truth  values,  i.e.,  true  or  false.  In 
epistemic  logic,  however,  propositions  may  be  not  only  true  or  false,  but  may  also  be  known 
to  be  true  or  false,  or,  alternatively,  they  may  not  be  known  to  be  either  true  or  false. 
Rather  than  introducing  new  scales  of  truth,  as  is  done  in  multivalued  logic  [29],  epistemic 
logic  resorts  to  a  representation  scheme  where  knowledge  of  a  proposition  is  represented  by 
means  of  another,  related,  proposition. 

A  rational  agent’s  state  of  knowledge  about  the  truth  of  a  proposition  is  represented  by 
means  of  a  special  operator  K,  used  as  a  prefix  to  symbols  describing  other  propositions. 
For  example,  knowledge  of  the  truth  of  a  proposition  p  is  denoted  Kp,  while  ->K p  symbolizes 
lack  of  such  knowledge.7  The  discussion  of  epistemic  systems  also  requires  differentiation  be¬ 
tween  propositions  that  describe  certain  properties  of  the  real  world  ( objective  propositions) 
and  propositions  that  include  one  or  more  epistemic  operators  (epistemic  propositions) 

In  our  investigation,  we  have  employed  a  particular  form  of  epistemic  logic  proposed 
by  Moore  [24]  to  deal  with  problems  of  reasoning  and  planning  in  artificial  intelligence 
applications.  The  axiom  schemata  for  such  a  modal  system  is: 

Al.  Axioms  of  the  ordinary  propositional  calculus. 

A2.  Kp  — *  p  (If  a  proposition  is  known  to  be  true,  then  it  is  true.) 

TThe  meaning  of  the  notation  ->Kp  should  not  be  confused  with  ignorance  about  the  truth  of  p  represented 
by  ->KpA  — «K(— <p),  i.e.,  neither  p  nor  its  negation  is  known  to  be  true. 
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A3.  Kp  — *  KKp  (Positive  introspection:  If  a  proposition  is  known  to  be  true,  then 
it  is  known  that  it  is  known  to  be  true.) 

A4.  K(p  — »  g)  — +  (Kp  — +  Kg)  (Consequential  omniscience :  If  it  is  known  that  p 
implies  q,  then  knowledge  of  the  truth  of  p  implies  knowledge  of  the  truth 
of  q.) 

A5.  If  p  is  an  axiom,  then  Kp  is  true. 

A6.  ->Kp  — *  K->Kp  (Negative  Introspection :  If  the  truth  value  of  a  proposition  is 
unknown,  then  such  a  state  of  ignorance  is  known.) 

The  set  of  all  possible  truth  assignments  to  the  sentences  of  a  modal  propositional 
system  that  satisfy  these  axioms  is  called  an  epistemic  universe  (Figure  3)  —  a  concept 
that  generalizes  that  of  the  Carnapian  universe.  Each  member  of  this  universe  is  a  possible 


Figure  3:  The  Epistemic  Universe. 

world  that  represents  both  a  particular  state  of  the  world  and  the  state  of  knowledge  that 
certain  rational  agents  have  about  it.  In  this  universe  two  classes  of  subsets  are  of  special 
importance. 

The  first  class  consists  of  subsets  of  possible  worlds  where  some  objective  proposition  p 
is  true.  These  subsets  are  called  truth  sets.  The  truth  set  for  a  proposition  p  is  denoted 
t(p). 

The  second  class  consists  of  subsets  having  as  members  possible  worlds  where  some 
objective  proposition  p  is  known  to  be  true.  These  subsets  are  called  support  sets,  with 
k(p)  denoting  the  support  set  for  the  objective  proposition  p. 
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Closely  related  to  support  sets  are  the  epistemic  sets,  which  partition  the  epistemic 
universe  into  subsets  characterized  by  the  same  knowledge  pattern.  Each  such  epistemic  set 
may  be  associated  with  a  proposition  p  that  represents  the  best  or  most  specific  knowledge 
available  in  each  possible  world  within  that  epistemic  set  (this  proposition  is  the  conjunction 
of  all  known  propositions  in  each  world).  Epistemic  subsets  are  identical  to  the  elements  of 
the  quotient  space  of  the  epistemic  universe  by  the  accessibility  relation.  The  accessibility 
relation  captures  the  informal  notion  that,  for  all  we  know  in  a  possible  world  w,  we  might 
just  as  well  be  in  an  accessible  or  conceivable  world  w' .  The  epistemic  set  corresponding  to 
an  objective  proposition  p  is  denoted  e(p). 

Several  important  set-theoretic  relations,  illustrated  in  Figure  4,  exist  between  members 


of  these  classes: 

•  The  support  set  for  a  proposition  p  is  the  union  of  the  (disjoint)  epistemic  sets  corre¬ 
sponding  to  propositions  q  that  imply  p,  i.e., 

Mp)  =  U  e(?)  • 

q— p 

In  plain  words,  if  p  is  known  to  be  true,  it  is  either  because  that  is  the  “best  available 
knowledge,”  or  because  such  “most  specific  knowledge”  is  that  another  proposition  q , 
that  implies  p,  is  true. 

•  The  support  set  k(p)  is  the  largest  support  set  (in  fact,  it  is  the  largest  arbitrary  union 
of  epistemic  sets)  included  in  the  truth  set  t(p). 

Because  epistemic  and  support  sets  are  always  uniquely  associated  with  an  objective 
proposition,  their  probabilities  may  be  thought  of  also  as  measures  that  assign  a  unique 
nonnegative  value  to  each  such  objective  proposition. 

If  P  is  such  a  probability,  the  functions 

m{p)=  P  (e(p)), 

Bel(p)  =  P(k(p)) , 
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are  related  by  the  basic  identity 


p=>? 

which  is  central  to  the  calculus  of  evidence  [34]. 

Probabilities  over  the  epistemic  algebra  (and  their  associated  functions)  represent  the 
effect  of  uncertain  evidence  on  a  rational  agent’s  state  of  knowledge.  The  corresponding 
probabilities  defined  on  the  truth  algebra  of  the  truth  sets  t (p)  can  be  interpreted  as  the 
degrees  of  likelihood  (usually  unknown)  of  objective  propositions. 

Because  the  largest  member  of  the  epistemic  algebra  that  is  contained  in  the  truth  set 
t(p)  is  the  support  set  k(p),  it  follows  (from  standard  results  on  lower-  and  upper-probability 
functions)  that  any  extension  of  a  probability  P,  defined  over  the  epistemic  algebra,  to  a 
probability  P  defined  over  the  truth  algebra  must  satisfy  the  inequality 

Bel(p)<P(t(p))<Pl(p), 

where  Pl(p)  is  the  plausibility  function  of  the  Dempster-Shafer  calculus  of  evidence.  Fur¬ 
thermore,  these  bounds  are  the  best  possible  and  cannot  be  improved  In  other  words, 
knowledge  of  actual  probability  values  over  some  subsets  provides  bounds,  which  may  not 
be  improved  except  by  incorporation  of  additional  evidence  —  on  the  probability  values  of 
other  sets. 

Issues  related  to  the  combination  of  evidence  are  readily  modeled  by  considering  another, 
more  complex,  set  of  possible  worlds  called  the  product  epistemic  universe.  The  members  of 
this  set  are,  as  was  the  case  in  previous  epistemic  universes,  possible  worlds,  that  is,  functions 
that  assign  conventional  binary  truth  values  (i.e. ,  true  or  false)  to  certain  propositions  of 
interest.  The  difference  in  this  case  consists  in  the  use  of  multiple  epistemic  operators 
Ka,K2l...  representing  the  knowledge  possessed  by  several  rational  agents  about  the  truth 
of  objective  propositions  or  of  other  epistemic  propositions. 

Constraining  ourselves  momentarily  to  situations  involving  two  different  rational  agents 
Aj  and  A2,  each  ignorant  of  the  knowledge  of  the  other,  their  common  (or  integrated) 
knowledge  may  be  modeled  by  introduction  of  a  third,  nonindexed,  epistemic  operator  K. 
It  is  assumed  that  the  knowledge  available  to  this  third  agent  is  the  sole  and  exclusive  result 
of  the  combination  of  the  knowledge  available  to  Ai  and  A2  without  any  other  additional 
sources  of  information.  This  assumption  is  formally  modeled  by  means  of  the  following 
knowledge  combination  axiom. 

(KC)  K p  is  true  if  and  only  if  there  exist  sentences  p\  and  p2  such  that  Kipj  and  K2p2 
are  true,  and,  in  addition,  such  that  pi  A  p2  =>  p. 

If  the  epistemic  sets  corresponding  to  the  operators  K,  Ki,  and  K2  are  denoted  by 
e(p),  ei(p)  and  e2(p),  respectively,  the  following  important  set-equation,  relating  all  types 
of  epistemic  sets,  is  the  basis  for  the  derivation  of  a  variety  of  combination  formulas: 

e(p)=  U  (®i (Pi )  n  e2(P2 ) )  i 

Pl  Apj=p 
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from  which,  under  certain  assumptions  of  probabilistic  independence,  the  Dempster  combi¬ 
nation  formula 

m(jp)  =  X  mi(Pi)m2(P2) , 

Pl  Ap2  =  P 

is  readily  derived. 

3.2.2  Semantic  Issues  of  Evidential  Reasoning 

Using  an  objectivist  interpretation  of  the  concept  of  probability,  the  author  has  formulated 
a  Kripke-type  model  [17]  that  explicates  basic  probability  assignments  as  the  principal  out¬ 
put  estimated  by  a  generalized  statistical  experiment.  This  model-theoretic  formalism  also 
sheds  light  on  the  general  character  and  nature  of  probabilistic  knowledge  and  on  the  mech¬ 
anisms  used  to  capture  it.  Rather  than  providing  a  formal  characterization  of  the  Kripkean 
formulation,  we  will  informally  describe  a  general  model  of  a  statistical  experiment  that 
provides  insight  into  the  nature  of  the  theoretical  structures  discussed  further  below. 

The  informal  model  that  serves  as  our  point  of  departure  is  illustrated  in  Figure  5, 
which  presents  the  typical  steps  involved  in  the  collection  of  statistics  about  the  behavior  of 
a  real-world  system.  A  statistical  experiment,  as  illustrated,  commences  with  a  mechanism 


Figure  5:  The  General  Statistical  Experiment. 

for  the  generation  of  samples  (i.e.,  sequences  of  possible  worlds  that  reflect  the  relative 
frequency  of  occurrence  of  such  states  of  affairs  in  actual  experience). 

Each  such  sample  is  then  examined  for  compliance  with  some  experimental  criteria  used 
to  determine  if  the  corresponding  possible  world  satisfies  the  criteria  used  for  the  generation 
of  the  desired  statistical  distribution.  In  other  words,  we  are  interested  in  estimating  a 
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conditional  probability,  and  this  test  determines  whether  or  not  the  condition  is  met.  Future 
usage  of  the  generated  statistical  values  is  valid  solely  if  available  evidence  (i.e.,  a  true 
proposition  about  the  world)  corresponds  exactly  to  the  condition  used  in  the  generation 
of  the  statistics. 

It  should  be  noted  that  the  nature  of  the  device  (sensor)  used  to  make  this  determination 
is  of  extreme  importance  in  determining  whether  the  generated  statistics  correspond  to  an 
epistemic  probability  over  the  truth  set  t(p)  (e.g.,  if  the  sensor  is  capable  of  reliable  binary 
discrimination  between  samples  where  p  is  true  and  samples  where  p  is  false),  or  over  a 
support  set  k(p)  corresponding  to  a  rational  agent  that  may  or  may  not  be  that  involved 
in  the  next  analysis  step  (e.g.,  some  sensor,  not  necessarily  that  used  to  further  analyze  the 
sample,  is  used  to  determine  if  p  is  valid;  its  failures,  however,  do  not  mean  that  p  is  false). 

If  the  sample  satisfies  the  conditions  defining  the  statistical  distribution  being  estimated, 
then  the  next  step  consists  in  the  determination  of  properties  (i.e.,  propositions  that  are 
true)  in  this  particular  possible  world.  The  conjunctions  of  these  propositions  are  the  “most 
specific  knowledge”  available  for  that  sample.  In  classical  statistical  setups,  the  analyzing 
devices  that  perform  such  a  determination  are  designed  so  as  to  determine  if  the  sample 
falls  into  one  of  several  exclusive  categories.  For  example,  in  clinical  trials,  the  result  of 
each  trial  is  typically  classified  on  the  basis  of  its  success  into  several  disjoint  sets  (e.g., 
“success”  or  “failure”).  In  more  general  experiments,  however,  the  ability  to  determine 
“most  specific  knowledge”  may  be  severely  limited  and  the  sample  will  be  placed  into  one 
of  several  classes  that  may  be  overlapping.  For  example,  if  the  samples  correspond  to 
medical  patients  having  certain  types  of  afflictions  (e.g.,  the  “condition”  is  that  they  have  a 
renal  or  a  hepatic  disorder),  available  knowledge  may  indicate  that  a  particular  patient  has 
a  disorder  within  a  certain  class  (e.g.,  kidney  disease),  while  failing  to  determine  a  specific 
disease. 

If  each  sample  is  so  classified  and  the  results  of  successive  analysis  are  tabulated  as 
frequencies,  the  resulting  distribution  is  a  mass  distribution  in  the  sense  of  Shafer  rather 
than  a  conventional  probability  distribution.  When  the  differences  between  probability 
distributions  and  their  sample-based  estimates  (which  are  often  the  source  of  second-order 
probability  distributions)  are  ignored,  the  computed  frequencies  may  be  considered  to  be  the 
same  as  a  nonconventional  distribution  that  corresponds  to  an  epistemic  probability.  The 
rational  agent  in  this  distribution  is  the  statistical  experimenter  who  has  a  “most  specific 
knowledge”  for  each  possible  world  (actually  for  a  relevant  sample  of  such  worlds).8 

The  knowledge  of  the  approximate  reasoner,  on  the  other  hand,  is  limited  to  knowledge  of 
(aggregated)  results  of  the  statistical  experiment  coupled  with  knowledge  of  the  condition 
validating  the  use  of  the  statistical  (epistemic)  distribution  (i.e.,  the  condition  used  to 
determine  if  the  samples  were  acceptable).  Note  that  this  distribution  generally  induces 

8 Note  that  in  classical  experimental  setups,  where  the  conditions  of  the  experiment  may  be  closely 
controlled,  the  most  specific  knowledge  corresponds  to  the  determination  of  the  actual  possible  world  where 
the  sample  lies.  In  those  cases,  the  sample  frequencies  estimate  probability  values  for  an  actual  probability 
distribution. 
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bounds  on  the  probability  of  truth  sets.  The  latter,  however,  are  needed  to  solve  typical 
decision-making  problems. 

In  closing  our  description  of  the  calculus  of  evidence,  it  is  important  to  point  out  that,  in 
addition  to  our  objectivist  model,  subjectivist  interpretations  of  belief  and  mass  functions 
have  been  proposed  by  Smets  [35]  and  Jaffray[I6].  The  formulation  of  JafTray  is  partic¬ 
ularly  attractive  in  that  it  provides  a  simple,  direct  generalization  of  the  basic  results  of 
DeFinetti  [6]  on  the  probabilistic  nature  of  degrees  of  belief. 

4  Possibilistic  Reasoning 

Possibilistic  approaches  produce,  as  is  the  case  with  their  probabilistic  counterparts,  so¬ 
lutions  to  problems  that  are  a  modified  formulation  of  the  impossible  (or,  at  least,  very 
difficult)  task  of  determining  hypothesis  validity.  The  emphasis,  however,  is  not  on  de¬ 
termination  of  the  frequency  of  instances  where,  under  similar  conditions,  the  hypothesis 
will  be  true  or  false.  Possibilistic  methods  seek  to  produce  unequivocal  answers  to  other 
questions  that  are  similar  in  some  sense  to  those  of  interest  to  the  system  analyst. 

For  example,  in  a  medical  diagnosis  problem,  a  probabilistic  method  may  answer  the 
question  “Does  the  patient  have  disease  D?”  by  means  of  a  probability  value  that  fails 
to  indicate  whether  the  disease  exists  or  not  but  that  allows  evaluation  of  the  chances  of 
successful  treatment.  A  possibilistic  method,  on  the  other  hand,  may  answer  the  same 
question  by  responding  unequivocally  (i.e.,  true  or  false)  to  the  modified  query  “Does  the 
patient  have  a  disease  of  type  Dm?”  where  Dm  stands  for  a  class  of  diseases  that  are  similar, 
in  some  sense,  to  the  disease  D. 

Similarity  between  propositions  (sometimes  regarded  as  the  “degree  of  ease”  by  which 
a  proposition  describes  a  particular  state  of  affairs)  may  be  used  as  the  basis  for  explaining 
the  basic  concepts  and  structures  of  fuzzy  set  theory  and  its  logic-oriented  extensions. 

A  fuzzy  set  /  [41]  is  defined  by  its  membership  function  mapping  elements  from  a  universe 
11  to  the  [0, 1]  interval  of  the  real  line 


pj  :  [0,1]. 

The  concept  of  membership  function  generalizes  the  notion  of  characteristic  function  of  a 
conventional  set.  For  a  particular  element  x  of  11,  the  value  pj(x)  represents  the  degree 
of  membership  of  x  to  the  fuzzy  set  /  Unlike  conventional  sets  where  elements  either 
belong  or  do  not  belong  to  a  set,  fuzzy  sets  —  representing  vague  concepts  —  admit  partial 
membership  ranging  from  0  (nonmembership)  to  1  (full  membership). 

Fuzzy  sets  may  also  be  described  by  means  of  their  a— cuts  consisting  of  all  members 
with  a  degree  of  membership  greater  than  or  equal  to  a  value  a 

/(a)  =  {a-  :  pf(x)  >  a} 

Using  this  important  concept,  fuzzy  sets  may  also  be  regarded,  from  a  logical  viewpoint,  els 
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a  set  of  related  indexed  propositions  representing  different  levels  of  conceptual  applicability 
to  a  particular  state  of  affairs. 

The  set-theoretic  operations  (union,  intersection,  complementation),  originally  proposed 
by  Zadeh[41],  generalize  the  corresponding  operations  for  conventional  sets: 

»fng{x)  =  min[p/(z),ps(x)] , 

Hfug(x)  =  max[/i/(x),/is(x)j , 

Hj(x)  =  l-nf(x), 

where  x  is  a  member  of  the  universe  Zl. 

An  important  concept  in  fuzzy  set  theory  is  that  of  fuzzy  relation,  which  generalizes 
the  conventional  set-theoretic  notion  of  relation.  If  Zl  and  V  are  universes,  then  a  fuzzy 
relation  between  Zl  and  Vis  a  fuzzy  set  in  the  set  of  all  pairs  ( u,v )  (or  cartesian  product), 
where  u  is  an  element  of  Zl  and  where  v  is  an  element  of  V  One  of  the  main  reasons  for 
the  importance  of  fuzzy  relations  is  their  role  in  the  representation  of  vague  relationships 
between  variables,  e  g., 

If  u  is  high,  then  v  is  small. 

Approximate  reasoning  systems  used  in  possibilistic  systems  u*>  '  ••’»v  ••oKtiof’s  to  represent 
inferential  rules  in  their  knowledge  bases. 

4.1  Possibility  Theory 

Possibility  theory  is  based  on  the  representation  of  vague  information  as  elastic  constraints 
on  the  possible  values  that  may  be  attained  by  a  variable.  For  example,  if  information  is 
available  indicating  that  “James  is  rich,”  a  possibilistic  approach  represents  this  fact  as  a 
possibility  distribution  on  the  values  of  a  variable  describing  James’s  wealth  (called  here 
James-net-worth)  in  the  form 

—  net—  worth  —  rich 

where  rich  is  a  fuzzy  set  defined  over  the  real  numbers  intended  to  describe  for  each  possible 
value  of  James-net-worth  the  degree  of  ease  by  which  the  concept  “rich”  agrees  with  that 
particular  net  worth. 

In  general,  if  a  variable  X  takes  values  over  a  universe  Zl,  then  a  linguistic  expression  of 
the  form  “X  is  F”  will  be  formally  translated  by  a  possibilistic  assignment  I1a-  =  F ,  such 
translation  being  denoted  as 

A  is  F  -  nA-  =  F  , 

meaning  that  the  values  that  may  be  attained  by  X  are  constrained  as  specified  by  the  fuzzy 
set  F.  Because  vague  statements  in  natural  language  are  translated,  in  possibility  theory, 
into  formal  statements  that  assign  a  fuzzy  value  to  a  variable  (as  opposed  to  assigning 
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a  precise  value  as  would  be  the  case  for  a  precise  statement),  such  a  variable  is  called  a 
linguistic  variable. 

Other  translation  rules  are  used  to  derive  representations  for  more  complicated  linguistic 
statements,  such  as  “X  is  F  and  Y  is  G”  or  “ Q  As  are  F”  (where  Q  is  a  generalized  quantifier 
such  as  “most”),  are  the  basis  of  an  uncertainty  calculus  that  is  complemented  by  certain 
inferential  rules  that  allow  derivation  of  possibilistic  constraints  for  certain  variables  as  a 
function  of  constraints  on  related  variables.  Among  these  rules  the  most  important  is  the 
“generalized  modus  ponens”  that  produces  an  approximate  conclusion 

n  y  =  g\ 

meaning  UY  is  G1” ,  from  knowledge  that 

n  X  =  F', 

meaning  that  “X  is  F'”,  and  that 

Dy/x  =  (F  — ►  G) , 

i.e.,  “If  X  is  F,  then  Y  is  G” ■ 

The  qualifier  “generalized”  is  used  to  indicate  the  important  fact  that,  unlike  classical 
modus  ponens,  this  inference  rule  allows  a  rule  to  be  used  even  when  available  facts,  F',  do 
not  match  precisely  the  antecedent  of  the  rule  (i.e.,  F).  The  conclusion  G'  in  such  a  case 
differs  also,  in  general,  from  the  consequent  of  the  rule,  being  a  more  general  or  less  specific 
constraint  than  G 

4.2  Similarity  Relations  and  Possible  Worlds 

A  similarity  relation  in  a  set  X  is  a  function  that  assigns  a  real  value  between  0  and  1  to  every 
pair  of  objects  from  X .  Similarity  relations  play  an  important  role,  recently  investigated 
in  detail  by  the  author  [32],  in  the  interpretation  of  the  basic  concepts  and  structures  of 
possibility  theory.  The  results  of  this  research  show  that  the  notion  of  possibility  may  be 
explained  in  terms  of  a  similarity  function  defined  over  a  universe  of  possible  worlds.  This 
similarity  defines  a  metric  that  quantifies  the  extent  of  resemblance  between  pairs  of  states 
(as  evaluated  from  the  viewpoint  of  the  particular  problem  being  considered).  For  example, 
in  a  planning  problem,  the  planner  may  use  such  measures  to  describe  the  extent  by  which 
the  plan’s  effects  resemble  some  planning  goal  or  objective. 

The  value  S(ui,w')  that  a  similarity  relation  assigns  to  a  pair  of  worlds  (w,w')  in  a 
universe  ZI  is  a  numerical9  measure  of  the  extent  by  which  propositions  that  are  true  at 
w  may  be  expected  to  hold  true  at  u/.  A  similarity  value  of  1  for  S(w,w')  (the  highest 
possible)  indicates  that,  from  the  point  of  view  of  the  propositions  used  to  construct  our 

9The  requirement  that  similarities  be  numerical  may  be  relaxed  considerably.  We  shall  confine  our 
exposition,  however,  to  [0,  l]-valued  similarities  for  the  sake  of  clarity. 
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universe,  both  worlds  are  indiscernible,  i.e. ,  that  the  same  propositions  are  true  in  w  and  in 
w'.  A  value  of  0,  in  contrast,  tells  us  that  knowledge  of  propositional  truth  in  w  does  not 
have  any  predictive  value  over  truth-values  in  w1  (and  vice  versa). 

Unlike  probability  values  that  represent  the  behavior  of  a  system  and,  as  such,  are  a 
property  of  the  system  (the  same  may  be  said,  under  an  subjectivist  interpretation,  of 
degrees  of  belief  as  a  property  of  a  rational  agent),  similarity  functions  are  arbitrarily 
defined  (but  not  necessarily  subjective)  scales  that  facilitate  the  description  of  the  degree 
by  which  an  object  has  some  property.  Thus,  similarities  are  as  useful  (and  arbitrary)  as 
any  other  metric  scale;  their  utility  is  essentially  a  function  of  the  degree  by  which  the  scale 
distinguishes  between  different  states  of  a  system  and  the  degree  by  which  similarity  scales 
that  are  associated  with  different  properties  (e.g.,  the  pressure  and  volume  of  a  perfect  gas) 
are  related  to  each  other  by  means  of  actual  physical  laws  (or  facilitate  the  expression  of 
such  laws). 

Simply  stated,  similarities  provide  the  measurement  sticks  that  must  be  employed  to 
characterize,  in  an  approximate  fashion,  the  state  of  the  real  world.  Correspondingly,  ap¬ 
proximate  inference  rules  describe  how  similarity  from  some  respect  (e.g.,  resemblance  of  the 
actual  state,  pressure  =  80  kg/m2 ,  to  some  prototypical  situation,  pressure  >  100  kg/m2), 
relates  to  similarity  from  another  viewpoint  (e.g.,  temperature  >  200 °C),  by  means  of  a 
fuzzy  relation  (e.g.,  “If  the  pressure  is  high,  then  the  temperature  is  high"). 

4.2.1  Properties  of  Similarities:  Triangular  Norms 

A  similarity  function  5  defined  on  a  possible-world  universe  ti  may  be  regarded  as  a  gen¬ 
eralization  of  the  modal-logic  notion  of  accessibility  or  conceivability  [15],  by  introduction 
of  multiple  binary  relations  Rq  between  possible  worlds  (one  for  each  value  of  a  between  0 
and  1),  defined  by 

Ra(w,w')  if  and  only  if  S(w,w')  >  a. 

Using  these  relations,  we  may  say  that  conditions  in  w  are  possible  to  some  degree  in  w'  on 
the  basis  of  the  value  of  S(w,  w')  (generalizing  the  classical  definition  of  the  modal  operator 
for  possible  truth). 

To  assure  that  the  function  5  has  the  properties  of  a  similarity  function,  a  number  of 
properties  must  be  required  to  assure  that  5  is  truly  a  measure  of  a  resemblance  between 
objects.  Among  these,  the  requirements  that  S{w,w)  =  1  (i.e.,  the  similarity  between  any 
world  and  itself  is  as  high  as  possible),  and  that  S(w,xv')  =  S(w',w)  (i.e.,  w  resembles  w' 
as  much  as  w'  resembles  w)  are  rather  natural. 

Less  obvious  than  those  properties  is  a  form  of  transitivity  that  may  be  motivated  by 
noting  that  if  S  were  to  assign  values  of  similarity  to  the  pairs  ( w,w ')  and  (w',w”)  that 
make  both  w  and  w'  highly  similar  and  w'  and  w"  also  highly  similar,  then  it  would  be 
surprising  if  w  and  w"  did  not  resemble  each  other  at  all.  Any  function  claiming  to  measure 
resemblance  must  be  such,  therefore,  that  the  similarity  value  S(ui.u/'),  is  bounded  by 
below  by  a  function  of  S(w,  w')  and  S(u/ ,  w"),  expressed  by  means  of  a  binary  operation  ® 
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in  the  form 

S(xv,  w")  >  S(w,  w')  ®  5(u/,  w") , 

which  is  graphically  illustrated  in  Figure  6. 
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Figure  6:  Transitivity  of  the  Similarity  Relation. 

In  terms  of  accessibility  relations,  this  condition  is  a  generalization  of  the  classical  ex¬ 
pression  for  the  transitivity  of  R,  i.e., 


R  C  RoR, 


to  the  form 

Ra®p  Q  Ro,  °R&  ,  for  all  0  <  a,  /?  <  1  , 
involving  the  multiple  relations  Ra . 

Imposition  of  reasonable  requirements  upon  the  operation  ©  immediately  shows  it  to  be 
a  triangular  norm ,  introduced  here  by  means  of  arguments  related  to  metrics  and  similarity, 
but  of  extreme  importance,  otherwise,  in  multivalued  logic  [38].  Important  examples  of  this 
operation  include  the  functions 

a  ©  b  =  min(a,  b) ,  a  ©  b  =  max(a  -(-  b  —  1, 0) ,  anda®fc  =  a6, 

called  the  Zadeh ,  Lukasiewicz ,  and  product  triangular  norms,  respectively. 

If  a  function  6  is  defined,  between  pairs  of  possible  worlds,  by  means  of  the  relation 

6  =  1-5, 

then  it  may  be  seen  that  when  ®  is  the  triangular  norm  of  Lukasiewicz,  6  is  an  ordinary 
metric  or  distance,  satisfying  the  well-known  triangular  inequality 

6(u>,  w")  <  6(w,  w')  -(-  6(ti/,  w") . 

When  ©  is  the  Zadeh  triangular  norm,  however,  the  transitivity  property  is  equivalent  to 
the  more  stringent  condition 

6(iv,  w")  <  max  (  6(w ,  u/),  6(tt/,  w")  ) , 

stating  that  6  is  an  ultrametnc  distance 
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4.2.2  Logic  and  Metrics:  The  Generalized  Modus  Ponens 

Metric  structures,  introduced  by  means  of  similarity  relations,  provide  a  mechanism  for  the 
characterization  of  logical  relations  by  means  of  structures  that  stress  proximity  rather  than 
subset-membership  relations  between  possible  worlds. 

If  a  typical  “conditional”  proposition  in  Boolean  logic,  i.e. ,  “  If  q ,  then  p,”  is  thought  of 
as  a  statement  that  every  world  where  q  is  true  is  one  where  also  p  is  true,  then  it  is  clear 
that  implications  are  equivalent,  as  is  well  known,  to  a  relationship  of  inclusions  between 
possible  worlds:  the  subset  of  9-worlds  is  a  subset  of  the  set  of  p-worlds. 

Statements  of  inclusion  between  subsets  of  possible  worlds  may,  however,  also  be  char¬ 
acterized  in  metric  terms  stating  that  every  9-world  has  a  p-world  (i.e.,  itself)  that  is 
as  similar  as  possible  to  it.  Logic  structures,  however,  allow  us  only  to  say  that  either  q 
implies  p,  or  that  q  implies  its  negation  ->p,  or  that  neither  of  those  statements  is  true. 
Similarity  relations,  by  contrast,  permit  the  measurement  of  the  amount  by  which  a  set 
must  be  “stretched”  (as  illustrated  in  Figure  7)  in  order  for  an  inclusion  relation  to  hold. 


Figure  7:  Extended  Set  Inclusion. 

One  such  measure  of  inclusion  is  provided  by  the  function  I  (called  the  degree  of  impli¬ 
cation),  defined  for  pairs  of  propositions  p  and  q  by  the  expression 

I(p|?)=  inf  sup  S(w,  w) , 

w'hq  whp 

which  is  related  to  the  well-known  Hausdorff-distance,  introduced  in  metric  space  theory  to 
measure  distance  between  subsets  as  a  function  of  the  distance  between  their  elements. 

Note,  in  particular,  I(p|?)  =  1,  then  every  9-world  is  similar  a  p-world  that  is  logically 
“indistiguishable”  from  it  (i.e.,  implication),  while  if  both  I(p  |  9)  and  1(9  |  p)  are  equal  to 
1,  then  p  and  9  are  logically  equivalent. 

From  this  perspective,  if  inferential  rules,  such  as  the  modus  ponens ,  are  thought  of  as 
the  tools  of  an  “implicational”  calculus,  i.e.,  “If  9  is  a  subset  of  p,  and  r  is  a  subset  of  9,  then 
r  is  a  subset  of  pr”,  then  possibility  theory  generalizes  such  calculus  by  deriving  relations 
between  neighborhoods  of  certain  subsets  of  possible  worlds  (actually  between  their  sizes). 

The  generalized  modus  ponens  of  Zadeh  [39]  is  a  direct  consequence  of  the  transitivity 
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property 


I(p|r)  >  I(p  |  q)  ®I(q  |  r)  ( 

of  the  degree-of-implication  function,  which  is  illustrated  in  Figure  8. 


Figure  8:  The  Generalized  Modus  Ponens. 

Derivation  of  the  actual  form  of  the  generalized  modus  ponens  from  similarity- based 
structures,  which  involve  possibility  distributions,  is  outside  of  the  scope  of  this  paper. 
It  will  suffice  to  say  here  that  possibility  distributions  measure  the  similarity,  from  the 
restricted  viewpoint  (called  marginal  similarity)  of  one  or  more  variables ,  between  certain 
subsets  of  possible  worlds,  and  that  fuzzy  inference  rules  provide  metric  knowledge  about 
inclusion  relations  between  such  subsets. 

In  closing,  it  is  important  to  stress  that  similarity  relations  justify  the  use  of  possibilistic 
logic  as  a  form  of  “logical  extrapolation”  exploiting  similarities  between  possible  worlds. 
The  topological  and  metric  structures  that  are  introduced  to  enhance  our  basic  Carnapian 
universe  are  of  a  substantially  different  nature  than  the  set  measures  exploited  by  probability 
theory  that,  typically,  measure  the  “sizes”  of  the  complementary  subsets  of  possible  worlds 
where  a  proposition  is  true  or  false,  respectively. 

5  Nonmonotonic  Logic  and  Commonsense  Reasoning 

Nonmonotonic  logic  and  commonsense  reasoning  are  also  concerned  with  the  problems 
caused  by  lack  of  the  information  that  is  required  to  deduce  the  truth  value  of  certain 
hypotheses.  As  is  the  case  with  approximate  reasoning  methodologies,  these  concerns  go 
beyond  considerations  about  the  theoretical  ability  to  produce  the  required  knowledge, 
encompassing  also  the  practical  issues  involved  in  such  production  To  use  a  most  famous 
example,  to  deduce  that  a  particular  bird  flies  requires  knowledge  that  such  bird  is  not  a 
penguin  or  ostrich  (at  least,  a  nonflying  ostrich),  that  he  is  not  sick,  dead,  and  so  forth. 
The  production  and  storage  of  this  information  imposes  heavy  burdens  on  both  users  and 
systems. 
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5.1  Nonmonotonic  Logic 

Faced  with  the  impossibility  of  collecting  such  information,  nonmonotonic  logic  systems  [28, 
13,8]  are  also  forced  to  deal  with  a  subset  of  possible  solutions.  Rather  than  relying  on 
descriptions  of  extensive  properties  of  such  set,  as  done  by  approximate  reasoning  methods, 
nonmonotonic  procedures  choose  one  of  its  members.  If  subsequent  information  eliminates 
that  choice  as  a  candidate,  then  one  or  more  of  the  “defeasible”  assumptions  are  retracted. 
Use  of  the  term  nonmonotonic  to  characterize  this  type  of  reasoning  is  intended  to  reflect 
both  the  nature  of  the  variation  of  truth  values  and  the  corresponding  changes  in  the  set  of 
true  statements  as  the  consequence  of  the  assimilation  of  new  information  (classical  logic 
methods  always  add  new  truths  to  the  set  of  existing  theorems,  thus  leading  to  “smaller” 
sets  of  possible  worlds). 

The  majority  of  nonmonotonic  logic  techniques  rely  on  minimality  arguments  to  choose 
possible  worlds  among  a  set  of  potential  solutions.  The  general  idea  of  these  methods 
consists  in  the  identification  of  a  “least  exceptional”  world,  that  is,  a  world  where  the  only 
objects  that  satisfy  certain  predicates  are  precisely  those  that  are  known  to  do  so.  Recent 
work  [3]  has  extended  these  ideas  to  the  approximate  reasoning  domain  by  consideration 
of  numerical  degrees  of  exceptionality. 

Similar  commonsense  reasoning  techniques  [28],  notably  default  reasoning ,  are  also  re¬ 
lated  to  probabilistic  reasoning.  Default  assumptions  (such  as  the  hypothesis  that,  by 
default,  birds  fly)  can  be  thought  of  as  stating  that  the  assumption,  given  our  current  state 
of  knowledge,  has  a  high  probability  of  being  true.  Known  characteristics  of  default  rea¬ 
soning,  notably  the  lack  of  transitivity  of  the  modus  ponens,  have  equivalent  counterparts 
in  probabilistic  reasoning. 

Studies  of  problems  where  knowledge  is  expressed  by  high  probability  statements  [26,1] 
and  developments  in  possibilistic  reasoning  techniques  concerned  with  the  manipulation  of 
certain  generalized  quantifiers  (e  g.,  “most”)  [40]  and  with  linguistic  statements  of  prob¬ 
ability  (e.g.,  “usually”)  [43]  have  also  shown  substantial  similarities  between  default  and 
probabilistic  reasoning. 

5.2  Qualitative  Process  Theory 

A  number  of  recent  research  efforts  [1 1,14,18]  have  been  oriented  toward  the  development 
of  methods  and  techniques  for  the  description  of  qualitative  aspects  of  system  behavior. 
The  basic  idea  of  these  qualitative  or  “naive”  physics  approaches  is  the  development  of  a 
computer-assisted  understanding  of  the  major  behavioral  characteristics  of  systems  of  major 
practical  interest. 

These  efforts  have  emphasized  the  use  of  imprecise  descriptions  in  order  to  avoid  un¬ 
necessary  numerical  detail  that,  according  to  their  proponents,  would  complicate  rather 
than  aid  understanding  of  causal  relationships  and  system  behavior.  This  concern  is  simi¬ 
lar  to  that  whicht  originally  motivated  the  introduction  of  fuzzy  set  theory,  which  sought 
to  provide  tools  to  produce  understandable  descriptions  of  large  and  complex  systems  by 


25 


avoidance  of  unnecessary  descriptive  detail. 

The  relationships  between  the  theories  go  considerably  beyond  their  common  goals  and 
objectives  as  qualitative  process  theory  has  made  substantial  use  of  imprecise  scalar- variable 
scales  that  recognize  three  possible  classes  of  values:  negative,  zero ,  and  positive.  These 
values  are  special  cases  of  linguistic  variables,  introduced  in  fuzzy  set  theory  [42],  which 
provide  for  the  qualitative  description  of  scalar  variables  using  formal  representations  of 
linguistic  qualifiers  such  as  large,  very  large,  and  small.  The  relationship  between  the 
theories  is  the  current  object  of  substantial  attention. 

6  Conclusions 

Possible-world  semantics  provides  a  perspective  into  approximate  reasoning  problems  and 
methods  that  helps  clarify  many  of  the  fundamental  issues  surrounding  the  nature  and 
usefulness  of  different  methodologies. 

Through  use  of  constructs  based  in  possible- world  formalisms,  it  is  easy  to  see  that  all 
existing  techniques  produce  correct  and  sound  descriptions  of  the  properties  of  the  subset 
of  possible  worlds  that  are  consistent  with  observed  evidence  rather  than,  as  sometimes 
thought,  ad  hoc  characterizations  of  an  ambiguously  relaxed  notion  of  truth. 

Furthermore,  these  formalizations  underscore  the  basic  relations  between  probabilistic 
techniques  showing  that  the  Dempster-Shafer  calculus  of  evidence  is  fully  consistent  with 
the  theory  of  probability.  By  contrast,  these  models  also  reveal  basic,  substantial  differences 
between  probabilistic  and  possibilistic  methods  —  the  former  related  to  set  measures  that 
characterize  the  frequency  of  occurrence  of  some  event,  and  the  latter  linked  to  notions  of 
similarity  between  possible  situations.  From  this  viewpoint  it  is  evident  that  possibilistic 
and  probabilistic  techniques  should  not  be  regarded  as  competing  tools  but,  rather,  as 
complementary  techniques  seeking  to  describe  different  properties  of  sets  of  possible  worlds. 

Finally,  it  is  important  to  point  out  that  possible-world  semantics  also  helps  to  clarify  the 
characteristics  and  purposes  of  nonmonotonic  and  commonsense  approaches  to  deductive 
inference. 
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Abstract 


This  note  presents  a  formal  semantic  characterization  of  the  major  concepts  and  constructs  of 
fuzzy  logic  in  terms  of  notions  of  distance,  closeness,  and  similarity  between  pairs  of  possible  worlds. 
The  formalism  is  a  direct  extension  (by  recognition  of  multiple  degrees  of  accessibility,  conceivability, 
or  reachability)  of  the  major  modal  logic  concepts  of  possible  and  necessary  truth. 

Given  a  function  that  maps  pairs  of  possible  worlds  into  a  number  between  0  and  1,  generalizing 
the  conventional  concept  of  an  equivalence  relation,  the  major  constructs  of  fuzzy  logic  (i.e.,  condi¬ 
tioned  and  unconditional  possibility  distributions)  are  defined  in  terms  of  this  generalized  similarity 
relation  using  familiar  concepts  from  the  mathematical  theory  of  metric  spaces.  This  interpretation 
is  different  in  nature  and  character  from  the  typical,  chance-oriented,  meanings  associated  with  prob¬ 
abilistic  concepts,  which  are  grounded  on  the  mathematical  notion  of  set  measure.  The  similarity 
structure  defines  a  topological  notion  of  continuity  in  the  space  of  possible  worlds  (and  in  that  of  its 
subsets,  i.e.,  propositions)  that  allows  a  form  of  logical  “extrapolation”  between  possible  worlds. 

This  logical  extrapolation  operation  corresponds  to  the  major  deductive  rule  of  fuzzy  logic 
— the  compositional  rule  of  inference  or  generalized  modus  ponens  of  Zadeh — an  inferential  operar 
tion  that  generalizes  its  classical  counterpart  by  virtue  of  its  ability  to  be  utilized  when  propositions 
representing  available  evidence  only  match  approximately  the  antecedents  of  conditional  proposi¬ 
tions.  The  relations  between  the  similarity-based  interpretation  of  the  role  of  conditional  possibility 
distributions  and  the  approximate  inferential  procedures  of  Baldwin  are  also  discussed. 

A  straightforward  extension  of  the  theory  to  the  case  where  the  similarity  scale  is  symbolic 
rather  than  numeric  is  described.  The  problem  of  generating  similarity  functions  from  a  given  set  of 
possibility  distributions,  with  the  latter  interpreted  as  defining  a  number  of  (graded)  discernibility 
relations  and  the  former  as  the  result  of  combining  them  into  a  joint  measure  of  distinguishability 
between  possible  worlds,  is  briefly  discussed. 
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1  INTRODUCTION 


This  note  presents  a  semantic  characterization  of  the  major  concepts  and  constructs  of  fuzzy  logic 
in  terms  of  notions  of  similarity,  closeness,  and  proximity  between  possible  states  of  a  system  that 
is  being  reasoned  about.  Informally,  a  "possible  state”  (to  be  formalized  later  using  the  notion  of 
"possible  world”)  is  an  assignment  of  a  welt-defined  truth- value  (i.e.,  either  true  or  false)  to  all 
relevant  declarative  knowledge  statements  about  that  system. 

The  primary  goal  that  guided  the  research  leading  to  the  results  presented  in  this  work  has  been 
one  of  conceptual  clarification.  A  great  deal  of  energy  has  been  directed  in  past  few  years  to  debating 
the  methodological  necessity  and  relative  merits  of  various  approximate  reasoning  methodologies.  As 
a  result  of  these  exchanges,  the  need  to  consider  certain  nonclassical  approaches,  has  been  questioned 
on  a  variety  of  bases. 

Recognizing  the  need  for  the  development  of  sound  semantic  formalisms  that  shed  light  on  the 
nature  of  different  approaches,  the  author  has  pursued,  in  the  past  few  years,  a  line  of  theoretical 
research  seeking  to  describe  various  approximate  reasoning  methodologies  using  a  common  frame¬ 
work.  These  investigations  have  recently  shown  the  close  connection  between  the  Dempster-Shafer 
calculus  of  evidence  [35]  and  epistemic  logics.  This  relationship  was  elucidated  by  straightforward 
application  of  conventional  probabilistic  concepts  to  models  of  knowledge-states  that  distinguish 
between  the  truth  of  a  proposition  and  knowledge  (by  rational  agents)  of  that  truth.  Central  to 
this  development  is  the  notion  of  "possible  world”  used  by  Carnap  [6]  to  develop  logical  bases  for 
probability  theory. 

The  same  central  notion  of  possible  state  of  affairs  is  also  the  conceptual  basis  of  the  results 
presented  in  this  note,  which  is  aimed  at  establishing  the  semantic  bases  of  possibilistic  logic  with 
emphasis  on  the  study  of  its  possible  relations  and  differences,  if  any,  with  probabilistic  reasoning. 

The  results  of  this  investigation  clearly  show  that  possibilistic  logic  can  be  interpreted  in  terms 
of  nonprobabilistie  concepts  that  are  related  to  the  notions  of  continuity  and  proximity.  The  major 
functional  structures  of  fuzzy  logic,  i.e.,  possibility  and  necessity  distributions,1  may  be  defined  in 
terms  of  the  more  primitive  notion  of  similarity  between  possible  states  of  a  system  using  constructs 
that  are  the  direct  extension  of  well-known  concepts  in  the  theory  of  metric  spaces.  The  topological 
metric  structure  that  is  so  defined  may  be  used  to  derive  a  sound  inferential  rule  that  is  a  form 
of  logical  "extrapolation.”  This  rule  is  also  shown  to  be  the  compositional  rule  of  inference  or 
generalised  modus  ponens  proposed  by  Zadeh  [53].  Conversely,  possibility  distributions — expressing 
resemblance  from  some  specific  regard — may  be  used  to  derive  the  actual  similarity  functions — 
discerning  between  possible  worlds  from  the  joint  viewpoint  of  several  respects. 

The  constructs  that  are  used  to  derive  the  interpretation  presented  in  this  note  are  formally, 
structurally,  and  conceptually  different  from  those  that  explain  probabilistic  reasoning,  in  either 
its  objective  or  subjective  interpretations,  irrespective  of  methodological  reliance  on  interval-based 
approaches  to  represent  ignorance.  The  latter  class  of  methods — measuring  the  relative  proportion 

*  It  is  important  to  remark  that  the  acope  of  this  work  ia  limited  to  the  moat  fundamental  concepts  and  oonatructa 
of  fumy  logic  without  examining  related  notions  such  as,  for  example,  generalised  quantifiers. 
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of  (either  observed  or  believed)  occurrence  of  some  event — are  based  on  the  mathematical  notion  of 
set  measure,  while  the  former — seeking  to  establish  similarities  between  situations  that  may  be  used 
for  analogical  reasoning — are  related  to  the  theory  of  distances  and  metric  spaces. 

This  presentation  of  the  relationships  between  similarity-based  concepts  and  possibilistic  notions, 
while  grounded  on  a  formal  treatment  that  is  based  on  rigorous  logical  and  mathematical  formalisms, 
will  be  kept  at  a  level  that  is  as  informal  as  possible.  The  purpose  of  this  presentation  style  is 
to  facilitate  comprehension  of  major  ideas  without  the  clutter  that  would  need  to  be  otherwise 
introduced  to  keep  matters  strictly  precise.  For  this  reason,  we  will  refrain  from  formal  introduction 
of  structures  and  axiom  schemata,  that,  although  correct  and  proper,  may  encumber  understanding 
of  the  basic  concepts. 

Before  we  proceed  to  the  detailed  consideration  of  semantic  models,  I  must  briefly  remark  on 
the  epistemological  implication  of  these  developments.  The  present  interpretation  is  not  claimed 
to  be  the  only  one  that  may  be  advanced  to  define  the  notion  of  possibility  in  terms  of  simpler 
concepts,  nor  do  I  claim  that  it  may  not  be  sometimes  possible,  even  desirable,  to  model  possibilistic 
structures  from  other  bases.  My  intent  is  not  to  prove  the  conceptual  superiority  of  one  approach 
over  another  or  to  argue  about  the  relative  utility  of  different  technologies.  Rather,  I  hope  that  these 
results  have  contributed  to  establish  the  basic  conceptual  differences  to  the  treatment  of  imprecise 
and  uncertain  information  that  are  inherent  in  probabilistic  and  possibilistic  methods;  the  former 
oriented  toward  quantifying  believed  or  measured  frequency  of  occurrence,  and  the  latter  seeking  to 
determine  propositions — implied  by  the  evidence — that  are  similar,  in  some  sense,  to  a  hypothesis 
of  interest.  In  other  words,  beyond  accidental  domain-specific  relations,  both  types  of  methods  are 
needed  to  analyze  and  clarify  the  significance  of  imprecise  and  uncertain  information. 
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2  APPROXIMATE  REASONING  AND  POSSIBLE  WORLDS 

Our  point  of  departure  is  the  model-theoretic  formalisms  of  modal  logics.  Let  us  assume  that 
declarative  statements  about  the  state,  situation,  or  behavior  of  a  real-world  system  under  study 
are  symbolically  represented  by  the  letters  of  some  alphabet 

s/—  {p, ?,r, ...}  , 

which  are  combined  in  the  customary  way  using  the  logical  operators  -<,V,A,— ►  and  ~  (to  be 
interpreted  with  their  usual  meanings)  to  derive  a  language  Sf(i.e.,  a  collection  of  sentences). 
Furthermore,  we  augment  this  language  by  use  of  two  unary  operators  N  and  II,  called  the  ne¬ 
cessity  and  possibility  operators,  respectively,  having  usage  governed  by  the  rule 

If  ^  is  a  sentence,  then  and  TL4  are  also  sentences, 
introducing  the  ability  to  represent  different  modalities  for  the  truth  of  propositions. 

A  model  for  this  propositional  system  is  a  structure  consisting  of  three  components: 

1.  A  nonempty  set  of  possible  worlds  U  introduced  to  represent  states,  situations,  or  behaviors 
of  the  system  being  modeled  by  our  sentences.  In  what  follows  we  will  refer  to  this  set  as  the 
universe  of  discourse,  or  universe,  for  short. 

We  will  also  need  to  consider  a  nonempty  subset  if  of  the  universe  U,  which  is  introduced 
to  model  the  set  of  conceivable  worlds  that  are  consistent  with  observed  evidence.  This  set 
(possibly  equal  to  the  whole  universe  U)  will  be  called  the  evidential  set.  Throughout  this 
note,  we  will  assume  that  evidence  about  the  world  is  always  given  by  means  of  conventional 
propositions  that  allow  to  determine,  without  ambiguity,  whether  a  possible  world  either  is  or 
is  not  a  member  of  the  evidential  set.3 

2.  A  function  (called  a  valuation)  that  assigns  one  and  only  one  of  the  truth  values  true  or  false 
to  every  possible  world  ti>  in  the  universe  U  and  every  sentence  4  in  the  language.  Assignment 
of  the  truth-value  true  to  a  pair  ( w ,  4)  will  be  denoted  tvh4  (i.e.,  4  »  true  in  the  world  w). 

In  what  follows,  we  will  use  the  same  symbols  to  describe  subsets  of  possible  worlds  and  the 
propositions  that  are  true  only  in  worlds  that  are  members  of  such  subsets.  For  example,  the 
symbol  if  will  be  used  to  denote  both  the  evidential  set  and  the  proposition  that  asserts  the 
validity  of  the  corresponding  evidential  observations.  Using  this  notation,  for  example,  we 
will  write  w\~  if  to  indicate  that  the  world  w  is  compatible  (i.e.,  logically  consistent)  with  the 
evidence  if. 

Furthermore,  we  will  use  the  symbol  Jif ,  introduced  above  as  a  set  of  well-formed  sentences, 
to  denote  also  the  power  set  of  the  universe  U.  Rigorously,  subsets  of  U  strictly  correspond 
to  the  classes  of  equivalence  of  the  sentence  set  that  are  obtained  by  equating  logically 
equivalent  sentences.  In  the  same  simplifying  vein,  we  will  drop  also  the  customary  distinction 

2 For  the  sake  of  simplicity,  fussy  evidential  facts  such  as  Tom  is  rich,”  usually  considered  is  fuuy  logic,  will  not 
be  treated  in  this  note.  The  meaning  of  such  assert  ions  will  be  discussed  in  a  forthcoming  paper. 
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between  sentences — the  linguistic  expressions  of  something  that  may  be  true  or  false — and 
propositions — the  actual  things  being  asserted. 

3.  A  binary  relation  R,  between  possible  worlds,  called  the  accessibility,  conceivability,  or  reach¬ 
ability  relation,  introduced  to  model  the  semantic  of  the  modal  operators  N  and  II. 

It  is  not  necessary  to  review  here  the  well-known  axioms  [21]  that  restrict  the  assignment  of 
truth  values  to  well-formed  sentences  according  to  the  rules  of  propositional  logic.  To  facilitate 
comprehension  of  our  formalism,  we  need  to  recall  solely  the  rules  that  constrain  assignment  of 
truth  values  to  sentences  formed  by  prefixing  other  valid  expressions  with  the  modal  operators,  i.e., 

1.  The  sentence  <f>  is  necessarily  true  in  the  possible  world  w  (i.e.,  if  and  only  if  it  is  true 

in  every  world  w'  that  is  related  to  the  world  w  by  the  relation  R. 

2.  The  sentence  ^  is  possibly  true  in  the  possible  world  u>  (i.e.,  wHII^)  if  and  only  if  it  is  true 
in  some  world  w'  that  is  related  to  the  world  w  by  the  relation  R. 

If,  for  example,  the  relation  R  relates  worlds  that  share  the  same  (possibly  empty)  subset  of  true 
sentences  of  the  prespecified  set  of  expressions 

i.e.,  R(w,w')  if  and  only  if  any  sentence  ^  in  is  either  true  in  both  w  and  w'  or  it  is  false  in  both 
w  and  v/,  then  the  resulting  system  has  an  “epistemic"  interpretation  that  regards  related  possible 
worlds  as  “being  possible  for  all  we  know”  (i.e.,  observed  evidence,  corresponding  to  a  subset  of 
^  is  the  same  for  both  worlds).  In  this  case,  the  necessity  operator  N  corresponds  the  epistemic 
operator  K  of  epistemic  logics,  with  the  corresponding  system  having  the  properties  of  the  modal 
system  S5,  which  was  used — in  the  context  of  probability  theory — as  the  semantic  basis  for  the 
Dempster-Shafer  calculus  of  evidence  [35]. 

If,  on  the  other  hand,  the  original  interpretation  of  logical  necessity — corresponding  to  a  relation 
R  that  is  equal  toll  xU,  i.e.,  that  relates  every  pair  of  possible  worlds — is  given  to  the  operator  N, 
then  a  proposition  is  necessarily  true  if  and  only  if  it  is  true  in  every  possible  world. 

If  the  relation  R  is  chosen  as 

r=v  xsr , 

then  this  interpretation  may  be  used  to  characterise  approximate  reasoning  problems  as  those  where 
a  hypothesis  of  interest  is  neither  necessarily  true  nor  necessarily  false  in  worlds  in  the  evidential 
set  ff,  reflecting  the  inability  of  conventional  deductive  techniques  to  unambiguously  determine  the 
truth-value  of  the  hypothesis.9 

In  those  problems,  in  spite  of  this  fundamental  impossibility,  we  may  resort  to  approximate  rea¬ 
soning  methods  to  describe  various  properties  of  the  evidential  set  ff .  For  example,  the  probabilistic 
structures  utilized  by  various  probabilistic  reasoning  approaches  typically  characterize  relations  of 
the  form 

p{HAV):p(^HAV), 

between  the  “measures”  of  the  subsets  of  the  evidential  set  If  where  a  hypothesis  H  is  true  or  false, 
respectively. 

*The  notion  of  approximate  reasoning  problem  is  often  extended  to  encompass  situations  where  deductive  tech¬ 
niques  cannot  always  be  used  because  of  practical  limitations  on  computational  resources. 
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Our  aim  will  be  to  study  how  other  structures,  defining  a  metric  or  distance  in  the  universe  U, 
may  be  used  to  describe  the  nature  of  the  evidential  set.  To  do  so,  we  will  assign  a  different  meaning 
to  the  accessibility  relation,  giving  it  an  interpretation  that  regards  related  worlds  as  “similar”  or 
“close”  in  some  sense.  We  will  require,  however,  a  scheme  that  is  richer  than  that  provided  by  a 
single  relation  so  that  we  can  extend  modal  notions  and  derive  semantics  bases  for  fu«y  logic,  which 
relies  on  concepts  of  degrees  of  matching  or  closeness  expressed  by  real  numbers  between  0  and  1. 

In  what  follows  we  will  use  the  symbols  =>■  and  O  to  denote  strong  implication  and  equivalence, 
respectively.  A  proposition  q  strongly  implies  p  (denoted  q  =>  p)  if  and  only  if  p  is  true  in  any  world 
where  q  is.  Similarly,  p  is  logically  equivalent  to  q  (denoted  p  o  q )  if  and  only  if  p  and  q  are  true  in 
the  same  subset  of  worlds  of  U. 

Following  traditional  terminology,  we  will  say  also  that  a  proposition  p  is  satisfiable  if  there  exists 
a  possible  world  p  such  that  wh  p. 
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3  EXTENDED  MODALITIES 


We  turn  first  our  attention  to  the  problem  of  generalising  modal  logic  formalisms  to  explain  the 
structures  and  functions  of  fussy  logic. 

A  number  of  authors  have  studied  various  relations  between  fussy  and  modal  logics.  Lakoff  [24], 
Murai  et  al.  [28],  and  Schocht  [36]  have  proposed  graded  generalisations  of  basic  modal  constructs. 
Dubois  and  Prade  [13,14]  have  also  explored  analogies  between  these  nonstandard  logics.  In  a  recent 
paper  [12],  they  have  developed,  in  addition,  a  modal  basis  for  possibility  theory  by  means  of  the 
introduction  of  fussy  structures  into  modal  frameworks  with  the  goal  of  deriving  proof  mechanisms 
that  may  be  used  in  possibilistic  reasoning. 

The  goal  for  the  model  presented  in  this  note  is  somewhat  different  from  the  objectives  guiding 
those  efforts.  We  will  seek  explanations  for  ossibilistic  constructs  on  the  basis  of  previously  existing 
notions  rather  than  generalisations  of  modal  frameworks  by  means  of  fussy  constructs.  The  model 
presented  here  is  not  based  on  the  use  of  graded  notions  of  possibility  and  necessity  as  primitive 
— and,  by  implication,  easy  to  understand — structures.  The  foundation  for  this  model  is  provided 
by  a  generalisation  of  the  accessibility  relation,  which  is  given  a  simple  interpretation  as  a  measure 
of  resemblance  and  proximity  between  possible  worlds. 

We  will  extend  the  notion  of  accessibility  relation  to  encompass  a  family  of  nonempty  binary 
relations  Rq  that  are  indexed  by  a  numerical  parameter  a  between  0  and  1.  These  relations,  which 
are  nested,  i.e., 

Ra  C  R/i,  whenever  0  <  a , 

are  introduced  to  represent  different  degrees  of  similarity,  using  a  scheme  that  is  akin  to  that  used 
by  Lewis  in  his  study  of  counterfactuals[25].  The  family  of  accessibility  relations  introduced  here 
differs  from  that  proposed  by  Lewis,  however,  in  its  use  of  numerical  indexes4  and  in  the  nature 
of  the  overall  modeling  goals  that,  in  Lewis’  formalism,  are  intended  to  represent  changes  of  scale 
induced  by  consideration  of  different  restrictive  statements. 

3.1  Similarity  Relations 

To  facilitate  the  definition  of  a  family  of  accessibility  relations  we  introduce  a  similarity  function 

S:UxU>—[ 0,1], 

assigning  to  each  pair  of  possible  worlds  (u>,  w1)  a  unique  degree  of  similarity  between  0  (correspond¬ 
ing  to  maximum  dissimilarity)  to  1  (corresponding  to  maximum  similarity). 

With  the  help  of  this  function,  we  will  then  say  that  w  and  u/  are  related  to  the  degree  a, 
denoted  Ra(u>,  w'),  if  and  only  if  S(w,  w')  >  a.  In  this  way,  the  relations  Rg  have  the  required 
nesting  property  with  Ro  corresponding  to  the  whole  Cartesian  product  U  x  If  (or,  every  possible 
world  is  at  least  similar  in  a  degree  sero  to  every  other  possible  world). 

*  We  will  later  see  that  similarities  may  be  measured  using  more  general,  nonnumeric,  scales.  For  simplicity  reasons, 
we  will  avoid  at  this  point  the  introduction  of  more  general  schemes  that  unnecessarily  complicate  the  exposition. 
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Some  properties  are  required  to  assure  that  tbe  function  S  has  the  required  semantics  of  a 
metric  relationship  capturing  the  intuitive  notion  of  similarity  or  “proximity.”  It  is  first  necessary 
to  demand  that  the  degree  of  similarity  between  any  world  and  itself  be  as  high  as  possible,  i.e., 

S(w,  w)  =  1,  for  all  w  in  2/ . 

This  property  assures  that  every  one  of  the  accessibility  relations  R<,  will  be  reflexive  and,  following 
the  nomenclature  introduced  by  Zadeh  for  fuzzy  relations  [52],  we  will  also  say  that  the  similarity 
relation  is  reflexive. 

Next,  we  will  call  for  the  function  5  to  be  symmetric,  i.e., 

S(w,  w')  =  S(w',  w),  for  any  worlds  w  and  w'  in  U . 

This  is  a  very  natural  requirement  of  any  relation  intended  to  represent  a  relation  of  resemblance 
between  objects. 

Finally,  and  most  importantly,  we  will  impose  a  form  of  transitivity  requirement  upon  the  simi¬ 
larity  function  S  that  turns  it  into  a  generalized  equivalence  relation.  The  purpose  of  this  restriction 
is  to  assure  that  S  has  a  reasonable  behavior  as  a  metric  in  the  universe  of  possible  worlds.  It  would 
certainly  be  surprising  if,  for  some  similarity  5,  we  were  to  be  told  that  w  and  u/  are  very  similar 
and  that  uf  and  w"  are  also  very  similar,  but  that  w  does  not  resemble  w"  at  all.  Clearly,  there 
should  be  a  lower  bound  on  the  possible  values  of  S(tv,  ti/1)  that  may  be  expressed  as  a  function  of 
the  values  of  5(u*,u/)  and  S(u/,tv").  We  will  express  such  a  constraint  using  a  numeric  operation, 
denoted  ®,  that  takes  as  arguments  two  real  numbers  between  0  and  1  and  that  returns  another 
number  in  the  same  range,  i.e., 

®:  [0, 1]  x  [0, 1]  t-— *  [0, 1] , 

in  the  form  of  the  inequality 


S( rv,  w")  >  S{w,  w')  ®  S(u>',  u/') , 

assumed  valid  for  any  worlds  w,  w'  and  i o”  in  the  universe  U.  Recurring  again  to  a  modal  terminology, 
the  above  transitivity  constraint,  which  will  be  called  9-transitivity,  may  be  rewritten  in  relational 
form  as 

°Rfi,  for  all  0  <  a,P  <  1 , 

making  obvious  its  generalization  of  the  conventional  definition  of  transitivity  for  ordinary  binary 
relations,  i.e., 

RCRoR. 

Since  the  role  of  ®,  through  recursive  application,  is  that  of  providing  a  lower  bound  for  the 
similarity  between  the  two  end  members  u>i  and  wn  of  a  chain  of  possible  worlds  [u>i ,  t»j, . . . ,  wn] , 
it  is  obvious  that  the  operation  ®  should  be  commutative  and  associative.  Furthermore,  it  should 
also  be  nondecreasing  in  each  argument,  as  it  is  reasonable  to  ask  that  the  desired  lower  bound  be 
a  monotonic  function  of  its  arguments.  Finally,  it  is  also  desirable  to  ask  that 

o®  1  =  1 9a  =  a, 

i.e.,  that  the  values  of  the  similarities  of  two  indistinguishable  objects  to  a  third  should  be  the  same. 
These  requirements  are  equivalent  to  demanding  that  the  operation  ®  be  a  triangular  norm  [37], 
orT-n orm,  for  short. 
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Triangular  norms,  originally  introduced  in  the  theory  of  probabilistic  metric  spaces  to  treat 
certain  statistical  problems,  play  a  distinguished  role  in  [0,  ^multivalued  logics  [1,11,17,31]  as  the 
result  of  imposing  reasonable  requirements  upon  operations  that  produce  the  truth  value  of  the 
conjunction  of  two  expressions  as  a  function  of  the  truth  values  of  the  conjuncts.  Furthermore, 
generalised  similarity  relations  (called  B-R  relations  by  Zadeh  [54])  also  have  an  important  function, 
to  be  examined  further  later  in  this  note,  in  the  generalization  of  the  inferential  rule  of  modus 
ponens [43,10].  Our  axiomatic  derivation  for  the  requirement  that  ®  be  a  T-norm  is  based,  however, 
solely  on  metric  considerations,  applied  here  to  a  space  of  possible  worlds,  but  is  valid  in  general 
metric  spaces. 

From  the  axioms  of  triangular  norms,  it  is  easy  to  see  that 

a®0  <  min(o,/?), 

showing  that  the  minimum  function,  itself  a  T-norm,  is  the  largest  element  in  this  class  of  operations. 
Its  minimal  element,  on  the  other  hand,  is  the  noncontinuous  function  ®  defined  by 

(  a,  if 0=  1, 
a®  d  =  <  0,  if  as  1, 

!  0 ,  otherwise. 

Every  symmetric  and  reflexive  relation  is  ^-transitive  for  this  triangular  norm,  which  is,  therefore, 
of  little  practical  utility. 

In  what  follows,  we  will  also  impose  a  most  reasonable  additional  assumption  of  continuity  of 
0  with  respect  to  its  arguments  (i.e.,  why  should  there  be  a  jump  in  the  value  of  a  lower  bound 
provided  by  ®  when  the  values  of  its  arguments  are  slightly  changed?).  The  class  of  continuous 
T-norms  does  not  have  a  minimal  element,  although  under  certain  additional  assumptions  (requiring 
T-norms  to  be  also  J-copulas  [37]),  the  inequality 

max(a  +  0  —  1,0)  <  a  &f) 

also  holds  true,  showing  that  certain  important  continuous  T-norms  lie  between  that  of  the  Ki-logic 
of  Lukasiewicz  [17]  and  that  of  the  original  fuzzy  logic  proposed  by  Zadeh  [53]. 

Continuous  triangular  norms  play  a  significant  part  in  the  theories  of  pattern  recognition  and 
automatic  classification.  The  author  [33]  proposed  the  use  of  generalized  similarity  relations  based 
on  the  T-norm  of  Lukasiewicz  to  generalize  existing  classification  techniques — based  on  the  mapping 
of  a  similarity  function  into  a  conventional  equivalence  relation — to  the  fuzzy  domain — by  mapping 
these  T-norms  (called  likeness  relations  by  Ruspini)  into  generalized  fuzzy  partitions.  Bezdek  and 
Harris  [3]  independently  studied  axiomatic  approaches  to  cluster  analysis  based  on  the  use  of  several 
continuous  T-norms. 

The  author  has  also  studied  [34]  the  possible  relation  between  the  multivalued  logic  and  similarity 
related  aspects  of  T-norms,  and  suggested  that  the  degrees  of  similarity  between  two  objects  A  and 
B  may  be  regarded  as  the  “degree  of  truth”  of  the  vague  proposition 

“A  is  similar  to  B.” 

Having  argued  that  5  should  have  the  structure  of  a  generalized  equivalence  relation,  we  will 
assume,  mainly  for  reasons  of  simplicity,  that  the  function  5  is  the  dual  of  a  “true”  distance,  i.e., 
that 

S(w,  w')  =  1  if  and  only  if  w  =  w' . 
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This  restriction,  which  is  not  substantial,  is  introduced  primarily  to  assure  that  different  possible 
worlds  may  be  distinguished  by  means  of  the  function  5.  Otherwise,  the  equivalence  relation  that 
relates  two  worlds  w  and  w1  if  and  only  if  S(ti>,u>')  =  1  may  be  used  to  partition  our  universe  U  into 
“indistinguishable”  nonintersecting  classes — indicating  that  our  metric  cannot  discriminate  between 
significant  differences  in  system  state. 

Before  closing  our  presentation  of  generalized  similarity  relations,  it  is  important  to  remark  upon 
the  close  relation  between  the  notion  of  similarity  and  that  of  distance.  If  a  function  6  is  defined  in 
terms  of  a  similarity  function  S  by  the  simple  relation 

<5  =  1-5, 

then  it  is  easy  to  see  that  the  function  6  has  the  properties  of  a  metric  or  distance.  This  is  evident 
if  the  operation  ®  corresponds  to  the  T-norm  of  Lukasiewicz,  since  the  transitivity  condition  is 
equivalent  to  the  well-known  triangular  inequality,  i.e., 

6(w,  tv")  <  6(w,  w')  -f  6(w\  tv") . 

If  other  T-norms  are  used,  even  stronger  inequalities  hold,  with  the  so-called  “ultrametric  inequality” 

6(tv,  tv")  <  max  (  S(tv,  tv'),  6(tv',  w")  ) 

being  valid  for  the  T-norm  of  Zadeh.  In  this  case,  each  of  the  relations  in  the  family  Ra  (known  in 
fuzzy  set  theory  as  the  a-cuts  of  the  similarity  5)  is  a  conventional  equivalence  relation.  This  fact 
was  exploited,  prior  to  the  introduction  of  fuzzy  set  theory  and  fuzzy  cluster  analysis,  by  a  variety 
of  clustering  procedures  of  the  “single-link”  type  [22,40]. 

3.2  Possible  and  Necessary  Similarity 

Our  semantic  formalization  needs  require  the  introduction  of  constructs  to  indicate  the  extent  by 
which  a  concept  exemplifies,  illustrates,  or  is  an  adequate  model  of  another  concept.  Our  interpre¬ 
tations  shall,  therefore,  be  oriented  toward  characterization  of  the  degree  by  which  a  concept  can 
be  said  to  be  a  good  example  of  another  concept  with  the  purpose  of  defining  vague  concepts  by 
means  of  measures  of  proximity  between  defined  and  defining  concepts.  In  our  treatment,  each  of 
the  multiple  “definiens”  will  be  a  conventional  proposition  corresponding  to  a  subset  of  possible 
worlds.  It  is  conceivable,  however,  that  new  vague  concepts  might  also  be  described  by  indicating 
their  metric  relations  to  other  vague  concepts. 

The  required  constructs  are  based  on  the  idea  that  whenever  p  and  q  are  propositions  such  that 
p=>  q,  then  any  p-world  is  an  “example”  of  a  9-world.  This  basic  notion  will  be  generalized  by  the 
introduction  of  modal  structures  that  define  to  what  degree  possible  worlds  that  satisfy  a  certain 
proposition  q  fit  a  vague  concept.  Some  of  those  possible  worlds  are  “paradigmatic”  of  the  vague 
concept,  i.e.,  they  fit  it  to  a  degree  equal  to  1  in  the  same  sense  that  we  may  say,  for  example,  in  an 
absolute  (i.e.,  nongraded)  sense  that  somebody  whose  height  is  7  ft  is  definitely  “tall.”  If  we  use  a 
notion  of  graded  fitness,  however,  certain  worlds  will  fit  the  concept  to  a  degree,  i.e.,  they  resemble 
(or  are  similar)  to  some  paradigmatic  example  of  the  vague  concept. 

The  conventional  interpretation  of  possibility  needs  to  be  modified,  therefore,  to  capture  the  idea 
that  a  particular  possible  world  is  similar  in  some  degree  to  another  world  that  satisfies  a  “reference” 
proposition. 

5  The  a -cut  of  a  fussy  set  p.  U  ++  [0, 1]  is  the  conventional  set  of  all  point!  to  auch  that  p(ui)  >  a.  A  similar 
concept  is  defined  for  relations  as  subsets  of  a  product  space  U  x  V. 
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More  generally,  however,  we  will  be  interested  in  relations  of  similarity  between  pairs  of  subsets  of 
possible  worlds  rather  than  between  pairs  of  possible  worlds.  This  requirement  complicates  matters 
considerably  since  we  will  be  forced  to  consider  both  the  "validity”  of  a  proposition  p  in  some  world 
where  another  proposition  q  is  true,  as  well  as  its  applicability  b  every  world  where  q  is  true.  In 
the  former  case,  we  will  care  about  the  existence  of  9-worlds  that  are  similar  to  some  degree  to  some 
p- world,  while  b  the  latter  we  will  be  concerned  with  the  size  of  the  mbimum  neighborhood  of  p 
(as  a  subset  of  the  universe  U)  that  fully  encloses  the  subset  q. 

This  dual  concern  for  what  may  possibly  apply  and  what  must  necessarily  hold — an  essential 
aspect  of  modal  logic — is  typical  of  situations  where  relationships  between  ensembles  of  objects  are 
described  b  terms  of  relations  between  their  members.  In  the  probability  calculus,  for  example, 
knowledge  of  probabilities  over  eertab  families  of  subsets  provides  “sharp”  upper  and  lower  bounds 
(called  inner  and  upper  probabilities,  respectively)  for  the  probabilities  of  other  subsets — an  impor¬ 
tant  fact  b  the  extension  of  set  measures  to  larger  domains  [19].  The  role  and  properties  of  these 
bounds  b  the  Dempster-Shafer  calculus  of  evidence  is  well-known,  havbg  been  described  b  the 
original  paper  of  Dempster  [8],  related  to  concepts  of  modal  logic  by  Ruspbi  [35],  and  bebg  also  the 
subjects  of  considerable  formal  study  [7]  as  mathematical  structures. 

Analogies  between  the  role  of  probabilistic  bounds  (i.e.,  bounds  for  probability  values)  and  pos¬ 
sibility/necessity  distributions — shown  below  to  have  play  a  similar  part  with  respect  to  metric 
structures — have  been  the  source  of  much  of  the  confusion  about  the  need  for  possibilistic  schemes. 
Each  upper/lower-bound  pair,  however,  leads  to  a  substantially  description  of  the  nature  of  a  subset 
of  possible  worlds,  bebg,  b  either  case,  measures  that  arise  naturally  when  pobtwise  properties  are 
extended  to  set  partitions.  General  properties  of  these  measures  have  been  studied  by  Dubois  and 
Prade  [11]  b  the  context  of  approximate  reasonbg  and  b  other  regards  by  Pavlak  [30]. 

Our  generalizations  of  the  notions  of  possibility  and  necessity  are  related  to  the  so-called  de  re  [21] 
bterpretation  of  the  statement  “If  q,  then  p  is  possible”  as  the  modal  pi  ^positional  relation 

q  =>  lip. 

We  will  say  that  the  proposition  q  implies,  or  is  a  necessary  model  of,  the  proposition  p  to  the 
degree  a  if  and  only  for  every  9-world  w  there  exists  a  p-world  w'  that  is  at  least  a-similar  to  it, 
(i.e.,  S(w,  w')  >  o),  or  equivalently,  whenever 

9  =>  nop. 

Similarly,  we  will  say  that  the  proposition  9  is  consistent  with,  or  is  a  possible  model  of,  the 
proposition  p  to  the  degree  a9  if  and  only  there  exist  a  9-world  w  and  a  p-world  w'  that  are  at  least 
a-similar,  or  equivalently,  whenever 

-<(p  =>  -»no9  )  . 

The  similarity  function  that  we  have  btroduced  b  the  universe  U  provides  us  with  a  simple 
mechanism  to  quantify  both  the  extent  of  “inclusion”  and  that  of  the  “intersection”  between  pairs 
of  subsets  of  possible  worlds.7 

4 Note  that  our  characterisations  of  both  possibility  and  necessity  distributions  are  baaed  in  the  modal  possibility 
operators  II  a. 

7 For  reasons  that  by  now  should  be  evident,  we  will  not  need  to  introduce  a  concept  of  "unconditioned  possibility'' 
although  it  would  be  easy  to  do  so  using  9  *  U.  Being  concerned  with  the  power  of  certain  propositions  to  exemplify 
other  conditions,  s re  will  not  have  much  oc cession  to  deal  with  the  strength  of  tautologies  in  that  regard. 
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33  Possibilistic  Implication  and  Consistence 

The  notion  of  subnet  inclusion  and  its  related  concept  of  set  identity  are  of  central  importance  in 
deductive  logic,  since  subsets  of  possible  worlds  are  formally  equivalent  to  propositions  with  subset 
inclusion  and  identity  corresponding  to  logical  implication  and  equivalence,  respectively.  These 
propositional  relationships  are  the  basis  of  derivation  rules  such  as  the  modus  ponens.  The  notion 
of  intersection  plays  a  similar  role  in  modal  analyses  because  of  its  ability  to  express  the  potential 
validity  of  a  statement. 

Classical  accounts,  however,  recognize  only  two  “degrees”  of  inclusion  corresponding  to  the  cases 
when  either  a  set  9  is  a  subset  of  another  set  p  or  it  is  not,  with  a  similar  dichotomy  applying  to 
degrees  of  intersection.  Our  generalization  exploits  the  metric  structures  defined  between  sets  of  pos¬ 
sible  worlds  by  introducing  measures  that  describe  a  subset  as  enclosed  in  a  neighborhood  (of  some 
size)  of  another  set  while  intersecting  another  of  its  neighborhoods  (of  “smaller”  size).8  The  problem 
of  measuring  the  “size”  of  those  neighborhoods  is  tae  subject  of  our  immediate  considerations. 

33.1  Degree  of  Implication 

Our  definition  of  partial  implication  between  propositions  was  based  on  conditions  that  determine 
whether,  given  two  propositions  p  and  9,  one  of  them  implies  the  other  to  the  some  value  a.  In 
particular,  since  every  world  w  is  always  similar  in  a  degree  that  is  at  least  equal  to  zero  to  any 
other  world  w',  it  is  always  true  that  any  proposition  q  implies  any  other  proposition  p  to  the  degree 
zero.  It  is  often  the  case,  however,  that  the  degree  of  implication  between  p  and  q  is  at  least  equal 
to  some  certain  positive  value  a. 

If  we  want  to  generalize  procedures  based  on  inclusion  relationships,  such  as  the  modus  ponens, 
in  an  efficient  fashion,  we  will  need  measure  the  “optimal”  (or  maximum)  value  of  the  parameter  a 
such  that  q  implies  p  to  the  degree  a.  This  value  is  a  measure  of  the  degree  by  which  the  set  of  all 
p-worlds  must  be  “stretched”  to  encompass  the  set  of  all  9- worlds.  The  least  upper  bound  of  the 
values  of  the  similarities  between  any  9-world  ut  and  some  p- world  u;  (depending,  in  general,  from 
w')  is  given  by  the  degree  of  implication  function: 

Definition:  The  degree  of  implication  of  p  by  q  is  the  value 

I(p|?)=  >nf  sup  S(u>,u>')- 

«ef-p 


Defined  in  this  way,  the  degree  of  implication  I(p  |  q)  is  a  measure  of  the  “minimal  amount”  of 
stretching  required  to  reach  a  p-world  from  any  9-world,  in  the  sense  that  if  0  <  I(p  1 9),  then 

9  =>•  Upp. 

Furthermore,  a  is  the  largest  real  value  for  which  the  above  statement  may  be  made. 

As  the  following  theorem  makes  clearer,  this  function  provides  the  bases  for  the  generalization 
of  the  modus  ponens.  This  truth-derivation  procedure  may  be  thought  of  as  an  expression  of  the 
nesting  relationships  that  hold  between  the  sizes  of  neighborhoods  of  such  subsets. 

*It  k  important  to  recall  that,  due  to  our  reliance  on  similarity  rather  than  on  the  dual  notion  of  dieaimilarity  or 
distance,  high  values  of  a  correspond  to  low  values  of  “stretching”  or  to  smaller  set  neighborhoods . 
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Theorem:  The  degree  of  implication  function, 


I:  if  x  if~[ 0,1], 

has  the  following  properties: 

(i)  If  p  =>  r,  then  I(p  |  q)  <  I(r  |  q) 

(ii)  If  q  =►  r,  then  I(p  |  q)  >  I(p  |  r) 

(iii)  I(p|«)  >I(p|r)®I(r|?) 

where  p,q  and  r  are  any  sat isfi able  propositions. 

Proof:  The  first  two  properties  are  an  immediate  consequence  of  the  definition  of  degree  of  impli¬ 
cation.  To  prove  the  third,  observe  that  by  definition  of  similarity 

S(w ,  w')  >  S(w,  w")  ®  S(w'',  w') 

for  any  worlds  w,  w',  and  w". 

Taking  the  supremum  on  both  sides  of  this  inequality  with  respect  to  all  worlds  w  I-  p,  it  follows, 
because  ®  is  continuous,  that 

sup  S(w,w')  >  [sup  5(u>,u/')]  ®  S(w",w') . 

whp  whp 

Since  this  expression  is  true,  in  particular,  for  all  worlds  w"  1-  r,  it  is  true  that 
supS(tu,u/)  >  [  inf  sup  5(u»,ty")]  ®S(u»,u/) 

whp  w"hr  whp 

=  I(p|r)®S(u>,ti>'), 

where  w  is  any  world  such  that  u>  h  r. 

From  this  inequality,  it  follows,  since  ®  is  continuous,  that 

sup  S(w,w')  >  I(p|r)®  [sup  S(th,ti/)]  . 

whp 

Taking  now  the  infimum  on  both  sides  of  this  expression  over  all  worlds  u/  such  that  w'  H  q,  it  is 
easy  to  see,  using  again  the  continuity  of  ®,  that 

inf  sup  S(w,  w')  >  I  (p  |  r)  ®  [  inf  sup  5(0),^')]  , 

w'hf  whp  w'hf  i6hr 

proving  the  ®-transitivity  of  I.  I 

Note,  that  since  I(q  |  q)  =  1  for  any  proposition  q,  the  following  statement  is  also  true: 
Corollary.  If  p  and  q  are  propositions  in  if,  then 

I(p|?)  =  sup  [l(p|r)®I(r|g)]  . 
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Notice  also  that  if  I (p  |  5)  =  1,  then 

sup  S(w,  w')  =  1 ,  for  all  w1 1-  q . 

wk-p 

Under  minimal  assumptions  (assuring  that  the  supremum  operation  is  actually  a  maximization), 
this  relation  is  equivalent  to  stating  that  5  strongly  implies  p,  or  that  any  5-world  is  also  a  p-world. 

The  nonsymmetric  function  I  measures  the  extent  by  which  every  world  u/  in  a  certain  class 
resembles  some  world  w  (dependent  of  w')  in  a  reference  class,  possibly  explicating  the  nature  of 
the  nonsymmetric  assessments  [45]  found  in  psychological  experimentation  when  subjects  are  asked 
to  evaluate  the  degree  by  which  an  object  ‘‘resembles”  another.  The  results  obtained  in  those 
experiments  suggest  that  human  beings,  when  assessing  similarity  between  objects,  use  one  of  them 
(or  a  class  of  similar  objects)  as  a  reference  landmark  to  describe  the  other.  Such  assymmetries  might 
be  explained  by  noticing  that,  in  general,  I(p  1 4)  ^  I  (5  |  p),  indicating  that  the  stronger  stimulus 
might  generally  be  used  to  construct  a  reference  class,  which  is  then  used  to  describe  other  stimuli. 

The  degree  of  implication  of  one  proposition  by  another  can  be  readily  used  to  generate  a  measure 
of  similarity  between  propositions  that  generalizes  our  original  measure  of  similarity  between  possible 
worlds: 

S(p,5)  =  min[l(p|5),I(5|p)]  , 

quantifying  the  degree  by  which  the  propositions  p  and  5  are  equivalent. 

It  may  be  readily  proved  [44],  from  its  definition  and  from  the  transitivity  property  of  I  that  S  is 
a  reflexive,  symmetric,  and  ^-transitive  function  between  subsets  of  possible  worlds.  This  similarity 
function  is  the  dual  of  the  well-known  H&usdorff  distance,  defined  between  subsets  of  a  metric  as  a 
function  of  the  distance  between  pairs  of  their  members  [9],  which  is  given  by  the  expression 

S(A,B)  =  max  (  sup  inf  6(x,y)),  (sup  inf  6(x,p)) 

L  *€A  *€B  rEA 

The  result  expressed  by  the  transitive  property  of  the  degree  of  implication  may  be  stated  using 
modal  notation  in  the  form 

5  =>  IIar  and  r=>IIpq  imply  that  q=>n0Q/iP, 

as  the  simplest  form  of  the  generalized  modus  ponens  rule  of  Zadeh. 

The  relationship  between  this  rule  and  the  classical  modus  ponens  is  easier  to  perceive  if  it  is 
remembered  that  classical  conditional  propositions  of  the  form  “If  5,  then  p,”  simply  state  that  the 
set  of  5-worlds  is  a  subset  of  the  set  of  p-worlds.  Such  relationships  of  inclusion  may  also  be  described 
in  metric  terms  by  saying  that  every  5-world  has  a  p-world  (i.e.,  itself)  that  is  as  similar  as  possible 
to  it. 

Logic  structures,  however,  only  allow  us  to  say  that  either  5  implies  p  or  that  5  implies  its  negation 
->p,  or  that  neither  of  those  statements  is  true.  By  contrast,  similarity  relations  allow  measurement 
of  the  amount  by  which  a  set  must  be  “stretched"  (as  illustrated  in  Figure  1)  to  enclose  another 
set.  Using  such  metrics,  we  may  describe  the  generalized  modus  ponens  as  a  relation  between  the 
stretching  required  to  reach  p  from  any  point  of  the  set  r,  the  stretching  required  to  reach  r  from 
any  point  of  the  set  5,  and  the  stretching  required  to  reach  p  from  any  point  of  the  set  5. 

In  Section  5  we  will  derive  alternative  expressions  for  the  generalized  modus  ponens  that  allow 
to  propagate  both  measures  characterizing  degree  of  implication  and  degree  of  consistence;  a  dual 
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Figure  1:  The  Generalized  Modus  Ponens. 


concept  that  plays,  with  respect  to  the  notion  of  possibility,  the  function  that  is  fulfilled  by  the 
degree  of  implication  function  with  respect  to  necessity.  In  those  derivations,  by  introduction  of 
sharper  bounds  for  certain  conditional  concepts,  we  will  also  be  able  to  improve  the  quality  of  the 
bounds  provided  by  generalized  modus  ponens  rules  while  being  closer  in  spirit  to  its  usual  fuzzy-logic 
formulation. 

33.2  Degree  of  Consistence 

A  notion  that  is  dual  to  that  of  degree  of  implication  is  given  by  a  function  that  measures  the  point- 
wise  proximity  between  pairs  of  possible  worlds  from  an  “optimistic”  point  of  view  characterizing 
the  degree  by  which  statements  that  are  true  in  some  worlds  may  apply  on  others.  By  contrast,  the 
degree  of  implication  measures  the  extent  by  which  statements  that  are  true  in  p-worlds  must  hold 
in  9-worlds. 

Definition:  The  degree  of  consistence  of  p  and  q  is  the  value 

C  (p  |  q)  =  sup  sup  5(u>,  id')  . 


An  immediate  consequence  of  this  definition  that  C(- 1  •)  is  a  symmetric  function  that  is  increas¬ 
ingly  monotonic  in  both  arguments  (with  respect  to  the  =>  ).  It  is  also  easy  to  see  that  the  values 
of  the  degree  of  consistence  function  are  never  smaller  than  the  corresponding  values  of  the  degree 
of  consistence  function, 

I(p|?)  <  C(p|g), 

as  the  amount  of  stretching  required  to  reach  p  from  some  “convenient”  9-world  is  smaller  (i.e., 
higher  values  of  S)  than  that  required  to  reach  p  from  any  9-world.  In  general,  however,  the  degree 
of  consistence  function  is  not  transitive,  preventing  the  statement  of  a  “compatibility”  counterpart  of 
the  generalized  modus  ponens  rule.  Its  relationship  with  the  degree  of  implication  function  expressed 
by  the  expression 

C(plq)  =  sup  l(plw')  =  sup  l(q\tv) 
will  permit  us,  nonetheless,  to  derive  a  useful  bound-propagation  expression. 
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4  POSSIBILITY  AND  NECESSITY  DISTRIBUTIONS 


This  section  presents  interpretations  of  the  major  constructs  of  fuzzy  logic  — possibility  and  necessity 
distributions — in  terms  of  similarity-based  structures.  Possibility  and  necessity  distributions  are 
functions  that  measure  the  proximity  of  either  all  or  some  of  the  worlds  in  the  evidential  set  to 
worlds  in  other  sets  that  are  employed  as  reference  landmarks. 

The  role  played  by  possibility  and  necessity  distributions  is  similar  to  that  performed  by  lower 
and  upper  bounds  of  probability  distributions  (or  by  the  belief  and  plausibility  functions  of  the 
Dempster-Shafer  calculus  of  evidence)  with  respect  to  probability  distributions.  The  essential  differ¬ 
ence  between  these  bounds  and  those  provided  by  possibility/necessity  pairs  lies  in  the  fundamentally 
dissimilar  character  of  what  is  being  bound — metric  structures  relating  pairs  of  worlds  in  one  case; 
measures  of  set  size,  on  the  other.  Furthermore,  in  the  model  of  possibilistic  structures  that  is 
presented  in  this  note  necessity  (possibility)  distributions  are  any  lower  (upper)  bounds  of  certain 
metric  functions  rather  than  its  “best”  or  “sharpest”  bounds.  The  operations  of  fuzzy  logic  allow 
computation  of  bounds  for  some  of  these  measures  as  a  function  of  bounds  of  other  measures. 

4.1  Inverse  of  a  Triangular  Norm 

When  working  in  ordinary  metric  spaces,  it  is  often  convenient  to  express  the  conventional  statement 
of  the  triangular  inequality,  i.e., 


S(w,  w')  <  6(w,  w")  +  6(w",  iff1) , 


in  the  equivalent  form 

6(w,  w')  >  |  S(w,  tv")  —  6(w*,w")  | , 

which  utilizes  a  form  of  inverse  (i.e.,  the  substraction  operator  — )  of  the  function  used  to  express 
the  original  inequality  (i.e.,  the  addition  operator  +)■  This  notion  of  inverse  may  be  directly  gener¬ 
alized  [37]  to  provide  us  with  the  tools  required  to  define  possibility  and  necessity  functions  and  to 
derive  useful  forms  of  the  generalized  modus  ponens  involving  either  type  of  these  constructs. 
Definition:  If  ®  is  a  triangular  norm,  its  pseudoinverse  0  is  the  function  defined  over  pairs  of 
numbers  in  the  unit  interval  of  the  real  line,  by  the  expression 

a0(  =  sup{c:  5®c<a}. 

From  this  definition  it  is  clear  that  a 06  is  nondecreasing  in  a  and  nonincreasing  in  b.  Furthermore, 
a0O  =  l  and  a  0 1  =  a  for  any  a  in  [0, 1].  Other  important  properties  of  the  pseudoinverse  function 
are  given  in  the  works  of  Schweizer  and  Sklar[37],  Trillas  and  Valverde  [43],  and  Valverde[44]. 

Examples  of  the  pseudoinverses  of  important  triangular  norms  are  given  in  Table  1  together  with 
the  corresponding  conorms. 
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Table  1:  Triangular  Norms,  Conorms,  and  Pseudoinverses 


Name 

T-Norm  a® b 

Conorm  a  ©  b 

Pseudoinverse  a  0  b 

Lukasiewicz 

min  (a  +  b,  1) 

min  (1  +  a  —  6, 1) 

Product 

ab 

a  +  b  —  ab 

a/6 ,  if  6  >  a 

1 ,  otherwise 

Zadeh 

min  (a,  b) 

max  (a,  b) 

a  ,  if  6  >  a 

1 ,  otherwise 

4£  Unconditioned  Necessity  Distributions 

We  introduce  first  a  family  of  functions  that  bound  by  below  the  value  of  the  similarity  between 
any  evidential  world  in  ST  to  some  world  where  another  proposition  p  is  true.  These  unconditioned 
necessity  distributions  are  lower  bounds  for  values  of  the  degree  of  implication  l(p\if),  which 
measures  the  extent  by  which  statements  that  are  true  in  a  reference  set  (i.e.,  the  subset  of  p-worlds) 
must  hold  in  the  evidential  set. 

As  observed  before,  whenever  I(p|2T)  =  1,  it  is  true,  under  minimal  assumptions,  that  the 
evidential  subset  if  is  a  subset  of  the  set  of  all  p-worlds,  or  that  p  necessarily  holds  in  if.  If,  on 
the  other  hand,  I(p|  if)  =  a  <  1,  then  p  must  be  stretched  a  certain  amount — with  smaller  a 
corresponding  to  larger  stretching — in  order  for  one  of  its  neighborhoods  to  encompass  if. 
Definition:  If  if  is  an  evidential  set,  then  a  a  function  Nec(-)  defined  over  propositions  in  the 
language  if  is  called  an  unconditioned  necessity  distribution  for  if  if 

Nec(p)  <  I(p  |  if) . 

43  Unconditioned  Possibility  Distributions 

The  dual  counterpart  of  the  unconditioned  necessity  distribution  is  provided  by  upper  bounds  of 
the  degree  of  consistence  C(p|  if).  Whenever  C(p  |  if)  =  1,  it  is  easy  to  see  that,  under  minimal 
assumptions,  there  exists  a  p-world  w  that  is  in  the  evidential  set  if  or,  equivalently,  that  p  (for  all 
we  know)  is  possibly  true.  If,  on  the  other  hand,  C(p  |  if)  =  a  <  1,  then  there  exists  a  neighborhood 
(of  “size”  a)  of  some  p-world  that  intersects  the  evidential  set. 

Definition:  If  if  is  an  evidential  set,  then  a  function  Poss(-)  defined  over  propositions  in  the 
language  if  is  called  an  unconditioned  possibility  distribution  for  if  if 

Poss(p)  >  C{p\if). 

Since  the  value  Poss(p)  of  any  possibility  function  Poss(-)  is  an  upper  bound  of  the  value 
C  (p  |  if)  of  the  degree  of  consistence,  while  the  corresponding  value  Nec  (p)  of  any  necessity  function 
Nec(-)  is  a  lower  bound  of  I(p )  q),  it  follows  that  values  of  a  possibility  function  can  never  be  smaller 
than  the  corresponding  values  of  any  necessity  function,  i.e.,  that 

Nec(p)  <  Poss(p) . 
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4.4  Properties  of  Possibility  and  Necessity  Distributions 

In  this  subsection  we  will  develop  similarity-based  interpretations  for  some  basic  formulae  of  possi- 
bilistic  calculus.  These  expressions  may  be  thought  of  as  mechanisms  that  allow  the  extension  of  a 
partially  known  possibility  distribution.  For  example,  the  property  that 

max(Poss(p),  Po as(q))  >  C(p  Vq  |  if), 

which  is  proved  below,  is  the  similarity  interpretation  of  the  standard  rule  that  allows  computation 
of  the  value  of  the  possibility  value  of  a  disjunction  in  fuzzy  logic,  i.e., 

Poss (p  V  g)  —  max(Poss (p),  Poss(g))  . 

Theorem:  If  p  and  q  are  propositions,  and  if  the  quantities  Poss(p),  Poss (9),  Nec(p),  and  Nec(g) 
are  such  that 


Nec(p)  <  I(p\ff),  Nec (g)  <  I(q\&), 

Poss(p)  >  C(jp\ff),  Pos s(g)  >  C(g|8T), 

then  the  following  statements  (similarity-based  interpretations  of  the  basic  Jaws  of  fuzzy  logic)  are 
valid: 


max(Nec(p),  Nec(g))  <  I(pVg|fr), 
max(Poss(p),  Poss(g))  >  C(pVg|fr), 
min ( Poss (p),  Poss (q))  >  C(pAg|JT). 

Proof:  Note  first  that  since  C(- 1  •)  is  nondecreasing  (with  respect  to  the  ^  order)  in  its  argu¬ 
ments,  it  is  true  that 

Poss(p)  >  C(p|JT)  >  C(p  A  g  |  JT) , 

Po«(«)  >  C(g|8T)  >  C(pAg|iT), 

whenever  p  A  g  is  satisfiable,  from  which  it  is  easy  to  see  that 

min(Poss(p),  Poss(g))  >  C(p  Ag  |  Jf) . 

The  corresponding  result  is  obvious  when  p  A  g  is  nonsatisfiable. 

A  similar  argument  shows,  for  necessity  functions,  that 

max(Nec(p),  Nec(g))  <I(pVg|fr). 

To  prove  the  disjunctive  law  for  possibilities,  notice  that  if  /  is  any  function  mapping  elements 
of  a  general  domain  D  into  real  numbers,  then 

sup  {  /(d)  :  d  €  A  U  B  }  =  max  [  sup  {  /(d)  :  d  €  A  } ,  sup  {  /(d)  :  d  €  B  }  1  . 
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From  this  equality,  it  is  easy  to  see  that  if  Poss(j>)  and  Pots  (9)  are  upper  bounds  of  I(p  |  ST) 
and  1(9 1  ST),  respectively,  then 

max(  Poss(p),  Poss(9)  )  >C(pV9|$f), 

completing  the  proof  of  the  theorem.  I 

Note,  however,  that  another  law  commonly  given  as  an  axiom  for  necessity  functions  does  not  hold 
valid  in  our  interpretation.  As  illustrated  in  Figure  2,  the  distance  from  a  point  to  the  intersection 
of  two  sets  may  be  strictly  larger  than  the  distances  to  either  set  (i.e.,  the  similarity  will  be  strictly 
smaller).  In  general,  therefore,  it  is 

min(Nec(p),  Nec(9))  jC  I(p  A  9 1  if)  , 

making  invalid,  under  this  interpretation,  the  conjunctive  law  for  necessities  [11] 

Nec(pA9)  =  min(Nec(p),  Nec(9))  . 


Figure  2:  Failure  of  Conjunctive  Necessity. 

We  may  also  note  in  this  regard  that  the  similarity-based  model  that  is  discussed  here  does  not 
make  use  of  the  notion  of  negation  either  as  a  mechanism  to  generate  dual  concepts  or  on  its  own 
right  as  an  important  logical  concept.  It  is  the  intent  of  the  author  to  study,  in  the  immediate  future, 
alternative  models  where  notions  of  negation  and  maximal  dissimilarity  play  more  substantive  roles. 

4.5  Conditional  Possibilities  and  Necessities 

The  concepts  of  conditional  possibility  and  necessity  are  closely  related  to  the  previously  introduced 
unconditioned  structures.  These  structures  may  be  thought  of  as  a  characterisation  of  the  proximity 
of  a  world  w  to  some  or  all  of  the  worlds  where  a  proposition  p  is  true,  given  that  w  is  similar  in 
the  degree  1  to  the  evidential  set  Sf  (i.e.  w  h  fT).  With  this  fact,  in  mind,  we  could  have  used  the 
somewhat  baroque  formulation 

C(p|ff)=  sup  [l(p|w)®I(?|*)] 

•I- IT 
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to  define  unconditioned  possibility  distributions — a  rather  unnecessary  effort  if  we  consider  that 
«»l  w)  =  1  whenever  w  I-  if,  showing  its  obvious  equivalence  to  the  simpler  form  used  in  Sec¬ 
tion  3.3.2  above.  In  spite  of  such  observation,  the  above  identity  is  important  in  understanding 
the  purpose  of  the  definitions  given  below.  Those  definitions  interpret  conditional  possibilities  and 
necessities  as  a  measure  of  the  proximity  of  worlds  on  the  evidential  set  if  to  (some  or  all)  worlds 
satisfying  a  (conditioned)  proposition  p  relative  to  their  proximity  to  (some  or  all)  the  worlds  that 
satisfy  another  (conditioning)  proposition  q. 

The  mechanism  used  to  specify  that  relationship,  which  is  closely  related  in  spirit  to  results  of 
Valverde  [44]  on  the  structure  of  indistinguishability  relations,  is  based  on  the  pseudoinverse  function 
introduced  in  Section  4.1.  The  basic  idea  used  by  these  definitions  is  also  illustrated  in  Figure  3, 
where,  from  the  perspective  of  the  evidential  world  w,  the  similarity  between  the  p-world  u  and  the 
{-world  v  is  estimated  by  means  of  an  inequality  that  generalizes  the  “absolute  value”  form  of  the 
triangular  inequality,  i.e., 

*(“.  *)  >  I  *(«.  “0  -  *(».  “0 1 . 

to  its  similarity-based  form 

S(u,  v )  <  min  [ S(u,  in)  0  5(t>,  in),  S(v,  in)  0  5(u,  in)  ] . 


Figure  3:  Similarities  as  Viewed  from  the  Evidential  Set. 

The  required  interplay  between  similarities  to  conditioning  and  conditioned  sets  is  captured  by 
the  following  definitions. 

Definition:  Let  ST  be  an  evidential  set.  A  function  Nec(-|-)  mapping  pairs  of  propositions  in  the 
language  if  into  [0, 1]  is  called  a  conditional  necessity  distribution  for  if  if 

Nec({|p)  <  inf  [l({|u>)0l(p|uO] , 

»>-l r 

for  any  propositions  p  and  q  in  if. 
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Definition:  Let  ST  be  an  evidential  set.  A  function  Poss(-|-)  mapping  pairs  of  propositions  in  the 
language  if  into  [0, 1]  is  called  a  conditional  possibility  distribution  for  if  if 

Poss(?|p)  >  8Uj>  [l(fl  |  tv)  0 I(p  |  u>)] , 
for  any  propositions  p  and  q  in  if. 

It  is  easy  to  see,  from  these  definitions,  that  the  values  of  a  conditional  necessity  distribution  are 
never  larger  than  the  corresponding  values  of  any  conditional  possibility  distribution,  i.e., 

Nec(fl|p)  <  Poe* (fl|p) . 

Furthermore,  since  I(- 1  •)  is  (^-transitive,  then 

I(fl  |u/)  >  I(fl|p)®I(p|tu). 

From  this  inequality  and  the  definition  of  pseudoinverse  of  a  triangular  norm,  it  is  easy  to  see  that 
any  necessity  function  satisfies  the  inequality 

Nec(fl|p)>I(fl|p), 

i.e.,  the  bounds  for  necessity  functions  provided  by  the  evidential-set  perspective  are  stronger  than 
those  that  can  be  obtained  by  direct  use  of  the  degree  of  implication  function.9 

Note  also  that  if  Nec(p)  =  1,  indicating  that  I (p  |  if)  =  1,  and  if  Nec(fl|p)  =  1,  then  the  above 
definition  of  conditional  necessity  shows  that  I(fl  |  if)  =  1,  indicating  that  Nec(fl)  may  be  taken 
to  be  equal  to  I,  thus  generalizing  the  well-known  axiom  (consequential  closure)  of  certain  modal 
systems  (e.g.,  the  system  T,  as  discussed  in  Hughes  and  Creswell  [21]) 

If  Np  and  N(p  — » fl),  then  Nfl . 

The  definitions  above  can  also  be  further  interpreted  as  a  way  to  compare  the  similarities  between 
evidential  worlds  and  those  in  the  conditioning  and  conditioned  sets  by  noting  that  whenever 

I(fl  |  u;)  >  I(p  |  w) , 

for  every  evidential  world  w  h  if ,  then  Nec(fl|p)  may  be  chosen  to  be  equal  to  1.  Similarly,  if 
there  exists  some  world  wh  if  where  this  inequality  holds,  then  it  is  Poss(fl|p)  =  1.  In  either  case, 
however,  the  maximum  value  for  the  conditional  distribution  (i.e.,  1)  is  reached  when  the  proximity 
of  one  evidential  world  w — in  the  case  of  possibilities — or  of  every  one  of  them — in  the  case  of 
necessities — to  a  world  in  the  conditioned  set  exceeds  the  proximity  of  u;  to  the  conditioning  set 
p.  In  either  case,  once  again  recurring  to  an  apparent  notational  overkill,  we  may  state  this  fact  by 
means  of  the  identity  function  r  in  the  unit  interval: 

t:  [0, 1]  *-*  [0, 1]  :  o  *-*  o  , 


in  the  form 

I(fl|u;)>  r(I(p|u>)), 

*  A  dual  inequality  for  possibilities  involving  C($  |  p)  does  not  bold  in  general.  It  w  easy  to  eee,  however,  that 
c(i  \  if)  0  l(p  |  if)  m  a  possibility  function  for  g  given  p. 
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for  some  w  h  Sf  in  the  case  of  possibilities,  with  the  same  inequality  holding  for  every  wh  if  in  the 
case  of  necessities.  We  may,  however,  conceive  of  other  functions 


r  [o,  1]  [0, 1]:  a  H+  7(a) , 

with  7(a)  >  a  to  specify  a  stronger  form  of  implication,  as  illustrated  in  Figure  4,  i.e., 

I  *")  >  t(i(pI  *"))  - 

Similarly,  one  may  also  conceive  of  functions  t/>  with  ii>(a)  <  a  that  may  be  used  to  model  weaker 
forms  of  implication. 


Figure  4:  Examples  of  Possible  Similarity  Relationships  between  Conditioning  and  Conditioned  Sets. 

Poasibilistic  calculi  baaed  on  the  propagation  of  truth-mappings  of  this  type,  first  proposed  by 
Baldwin  [2],  are  utilised  in  the  RUM  [4,5]  and  MILORD  [18]  expert  systems.  The  particular  case 
when  7  =  r,  stating  that  every  a-cut  of  the  conditioning  proposition  p  is  fully  enclosed  (in  the 
conventional  sense)  in  the  a-cut  of  the  conditioned  proposition  q,  has  been  called  the  truth  mapping 
in  the  fuzsy  logic  literature. 

The  primary  purpose  of  conditional  distributions,  however,  is  to  provide  a  quantitative  measure 
of  the  strength  by  which  one  proposition  may  be  said  to  imply  another  with  a  view  to  extend 
inferential  procedures  by  means  of  structures  that  superimpose  the  topological  notion  of  continuity 
upon  a  logical  framework  concerned  with  propositional  validity. 
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5  GENERALIZED  INFERENCE 


The  major  inferential  tool  of  fussy  logic  is  the  compositional  ruJe  of  inference  of  Zadeh  [53],  which 
generalizes  the  corresponding  classical  rule  of  inference  by  its  ability  to  infer  valid  statements  even 
when  a  perfect  match  between  facts  and  rule  antecedent  does  not  exist,  i.e., 

P  P' 

from  p  — >  q  to  its  “approximate”  version  p  —♦  q  , 

9  ?' 

where  p/  and  q'  are  similar  to  p  and  q,  respectively.  In  this  sense,  the  generalized  modus  ponens 
operates  as  an  “interpolation”  (or,  more  precisely,  as  an  “extrapolation”)  procedure  in  possible- world 
space. 

Unlike  the  interpolation  procedures  of  numerical  analysis,  however,  which  yield  estimates  of 
function  value,  this  extrapolation  procedure  approximates  truth  in  the  sense  that  it  produces  a 
proposition  that  is  both  more  general  than  the  consequent  of  the  inferential  rule  and  resembles  it 
to  some  degree  (which  is  a  function  of  the  degree  by  which  p'  resembles  p).  The  “extrapolated 
conclusion,”  however,  is  a  correctly  derived  proposition,  i.e.,  the  result  of  a  sound  logical  procedure 
rather  than  of  an  approximate  heuristic  technique. 

5.1  Generalized  Modus  Ponens 

The  theorems  that  are  proven  below  are  based  on  the  use  of  a  family  &  of  propositions  that 
partitions  the  universe  of  discourse  U  in  the  sense  that  every  possible  world  will  satisfy  at  least  one 
proposition  in  S* . 

Definition:  If  &  is  a  subset  of  satisfiable  propositions  in  if  such  that  if  w  is  a  possible  world  in 
the  universe  U,  then  there  exists  a  proposition  p  in  &  such  that  u>  b  p,  then  the  family  ^  is  called 
a  partition  of  U. 

These  results  make  use  of  information  such  as  the  values  of  the  unconditioned  necessity  (resp.,  possi¬ 
bility)  distributions  for  antecedent  propositions  p  in  the  family  ^  together  with  the  values  Nec(g|p) 
(resp.,  Poss(?|p))  to  “extend”  the  unconditioned  distributions  to  the  “consequent”  proposition  q. 
In  this  sense,  these  findings  interpret,  in  the  same  spirit  used  in  the  theorem  of  Section  4.4  for  other 
basic  laws,  the  generalized  modus  ponens  laws  of  fuzzy  logic: 

Nec(?)  =  sup  [  Nec(g[p)®  Nec(p)  ]  , 

Po*s(q)  =  sup  [  Poss(?|p)®Poss(p)  ] . 


22 


Theorem  (Generalized  Modus  Patens  for  Necessity  functions):  Let  S*  be  a  partition  of  U  and 
let  q  be  a  proposition.  If  Nec(p)  and  Nec(?|p)  are  real  values,  defined  for  every  proposition  p  in 
the  partition  such  that 

Nec(p)  <  I(p|*T), 

Nec(«|p)  <  inf  [l(«  |  to)  ®  I(p  |  to)  ]  , 
whir 

then  the  following  inequality  is  valid 

sup  [Nec(f|p)®Nec(p)]  <  I(?|  fr). 

Proof:  Note  first  that  since  0  is  nonincreasing  in  its  second  argument  and  since 

I(p  \&)<  I(pI“>) 

for  every  evidential  world  w,  it  is 

Nec(?|p)  <  inf  [l(j  |  w)  0 1(p  | «/)]  <  inf  [l(?  |  u>)  0l(p  |  &)]  • 
wHtr  mt-ir 

It  follows  then  from  the  monotonicity  and  continuity  of  $  with  respect  to  its  arguments  that 

Nec(p) ®  Nec(fllp)  <  I(p  |  #)®  inf  [l(«  |  ti;)  0 1(p  |  if)] 

whir 

=  inf  [l(p|fr)®(l(«  |u>)0l(p|  if)) 
wt-ir  L 

<  inf  I(«jt») 
wi-ir 

=  I(«|») 

since 

I(p|*T)®(l(«|ti;)0l(p|Sr))  <I(«|uO, 

because  of  the  definition  of  0  and  the  continuity  of  ®. 

Since  the  above  inequality  is  valid  for  any  proposition  p  in  ,  the  theorem  follows.  I 

A  dual  result  also  holds  for  possibility  functions. 

Theorem  (Generalized  Modus  Ponens  for  Possibility  functions):  Let  &  be  a  partition  of  U  and 
let  q  be  a  proposition.  If  Poss(p)  and  Poss^lp)  are  real  values,  defined  for  every  proposition  p  in 
& ,  such  that 

Pom  (P)  >  C(p|«T), 

Poss(?|p)  >  sup  [l(?|w)0l(p|«')]  , 
whir 

then  the  following  inequality  is  valid 

sup  [  Poss  (?|p)  ®  Poss(p)  ]  >  c  (q  l  if ) . 

Proof:  Note  first  that  if  w  is  an  evidential  world,  then 

C(p|fr)>I(p|u')- 
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It  follows  then  from  the  nonincreasing  nature  of  0  with  respect  to  its  second  argument  that 

Poss (q Ip)  >  sup  [l($  |  w)  0 1(p  |  w)  ] 

whw 

>  sup  [l(?|t£>)0C(p|fr)] , 

and,  therefore,  that 

Poss (f|p)@ Poss (p)  >  sup  [l(q\w)<2>C(p\fr)]  ®C(p|£T). 
whir 


Taking  now,  in  the  above  expression,  the  supremum  with  respect  to  all  propositions  p  in  it 


is 


sup  [  Poss  (q  |p)®  Poss  (p)]  >  sup  |  su^  [l(?  |  w)  0  C(p  |  fr)]  ®  C(p|Jf)j 


(1) 


Note,  however,  that  since  &  is  a  partition,  there  always  exists  a  proposition  p  in  &  such  that 
C(p  |  if)  —  1  (i.e.,  p  “intersects”  ff)  and,  therefore, 

£  (l(«|w)  0C(p I  $0]  ®  C(p |  Jf)j  >  su^[l(g|u;)0C(p|ir)]  ®C(p|JT) 


sup  SU] 

s’  L«»- 

=  sup  I(?  I  w) 
whir 

-  C(q  |ST). 

The  thesis  follows  at  once  by  combination  of  the  inequalities  (1)  and  (2). 


(2) 

I 


Finally,  notice  also  that,  although  the  theorems  above  have  been  characterized  as  duals,  it  is 
not  necessary  that  &  be  a  partition  for  the  generalized  modus  ponens  for  necessities  to  hold,  while 
the  proof  of  its  possibilistic  counterpart  relies  on  such  assumption.  It  should  be  clear,  however, 
that  richer  propositional  collections  S*  would  lead  to  better  lower  bounds  for  values  of  the  degree 
of  implication  I(?  |  &). 

52  Variables 

The  ^-transitivity  property  of  I  is  the  essential  fact  expressing  the  relationships  between  the  degrees 
of  implication  of  three  propositions  that  were  proven  in  the  previous  section.  The  statements  of 
these  relations  in  most  works  devoted  to  fuzzy  logic  are  made,  however,  using  special  subsets  of  the 
universe  of  discourse  that  are  described  through  the  important  notion  of  variable.  Tntroduction  of 
this  concept,  which  is  also  central  to  other  approximate  reasoning  methodologies,  permits  us  to  make 
a  clearer  distinction  between  similarities  defined,  in  some  absolute  sense,  from  the  joint  viewpoint  of 
several  respects  and  related  proximity  measures  that  compare  objects  (in  our  case,  possible  worlds) 
from  the  marginal  viewpoint  of  one  or  more  variables. 

In  what  follows,  we  will  assume  that  only  certain  propositions,  specifying  the  value  of  a  system 
variable  belonging  to  a  finite  set 

r={x,Y,z,...}, 

will  be  used  to  characterize  possible  worlds. 
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The  propositions  of  interest  are  those  formed  by  logical  combination  of  statements  of  the  type 


“The  value  of  the  variable  V  is  v,” 

where  V  is  in  the  variable  set  and  where  v  is  a  specific  value  in  the  domain  &(V)  of  the  variable 
V. 

We  will  also  assume  that,  in  any  possible  world,  the  value  of  any  variable  is  a  member  of  the 
corresponding  domain  of  definition  of  the  variable.  In  the  context  of  our  discussion,  we  will  not 
need  to  make  special  assumptions  about  the  scalar  or  numeric  nature  of  the  state  variables,  using 
the  notion  in  the  same  primitive  and  general  sense  in  which  it  is  customarily  used  in  the  predicate 
calculus. 

We  will  be  specially  interested  in  subsets,  called  variable-sets,  of  the  universe  U  consisting  of 
worlds  where  the  value  of  some  variable  V  is  equal  to  a  specified  value  v.  We  will  denote  by  [X  =  z] 
(similarly  [V  =  y],  etc.)  the  set  of  all  possible  worlds  where  the  proposition  “The  value  of  the 
variable  X  is  x”  is  true.  Clearly,  the  variable-sets  in  the  collection 

{  (X  =  x] :  x  is  in  3(X) } 

partition  the  universe  into  disjoint  subsets.  These  collections  have  recently  been  used  to  charac¬ 
terize  the  concept  of  rough  sets [30],  of  importance  in  many  information-system  analysis  problems, 
including  some  that  arise  in  the  context  of  approximate  reasoning.  A  similar  notion  has  also  been 
used  also  to  describe  algorithms  for  the  combination  of  probabilities  and  of  belief  functions  [39], 

To  simplify  the  notation  we  will  write 

tvhx,  why,... 

as  shorthand  for  tnh[X  =  x],  iuh[Y  =  y],  . . .  ,  respectively. 

5.2.1  Possibllistic  Structures  and  Laws 

The  usual  statements  of  the  laws  of  fuzzy  logic  are  made,  as  mentioned  before,  through  the  use  of 
variables  rather  than  by  means  of  general  symbolic  expressions.  It  is  customary,  for  example,  to 
speak  of  the  possibility  of  the  variable  X  taking  the  value  z,  to  describe  the  value  that  a  possibility 
function  for  an  evidential  set  if  attains  for  the  proposition  [X  =  x]. 

In  our  model,  we  will  say  therefore,  that  a  function 

Poss(’):  3T(X)~[0,1] 

is  a  possibility  function  for  the  evidential  set  if  and  the  variable  X,  whenever 

P0ss(x)>C([X  =  x]|$r), 

for  all  values  z  in  the  domain  3f(X).  Similarly,  we  will  say  that  Nec(-)  is  a  necessity  function  for 
X  whenever 

Nec(x)  <  l(  [X  =  x]  |  )  , 

for  all  values  x  in  3f(X). 
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If  possibility  distributions  are  point  functions  defined  in  this  way  as  point  functions  in  the  variable 
domain  3(X),  then  it  is  possible  to  use  the  disjunctive  laws  of  fuzzy  logic  proved  in  Section  4.4  to 
extend  their  definition  over  the  power  set  of  3(X),  i.e., 

Nec(i4US)  =  max  [Nec(A),  Nec(5)]  , 

Poss(AU.B)  =  max  [Poes(A),  Poss(B)]  , 

where  A  and  B  are  subsets  of  the  domain  3{X).  These  equations  are  usually  given  as  the  basic 
disjunctive  laws  of  possibility  distributions. 

Note  that,  using  such  extensions,  both  possibility  and  necessity  functions  are  nondecreasing 
functions  (with  respect  to  the  order  induced  by  set  inclusion).  The  value  of  Nec(A)  measures 
the  extent  by  which  the  evidence  supports  the  statement  that  the  variable  value  necessarily  lies  in 
the  subset  A  of  its  domain  of  definition,  with  a  dual  interpretation  being  applicable  for  possibility 
distributions. 

5.2.2  Marginal  and  Joint  Possibilities 

The  original  similarity  relation  introduced  in  Section  3.1  may  be  considered  to  be  a  measure  of 
proximity  between  possible  worlds  from  the  joint  viewpoint  of  all  system  variables.  The  notion 
of  variable  permits,  however,  the  definition  of  similarities  from  the  restricted  viewpoint  of  some 
variables  or  subsets  of  variables. 

These  restricted  perspectives  play  a  role  with  respect  to  the  original  similarity  S  that  is  analogous 
to  that  of  marginal  probability  distributions  with  respect  to  joint  probability  distributions.  To  derive 
useful  expressions  that  describe  similarities  between  two  values  x  and  x'  of  the  same  variable  X , 
it  should  be  noted  first  that  the  degree  of  implication  I(- 1  •)  is  transitive.  This  fact  permits  the 
application  of  a  theorem  of  Valverde  [44]  to  define  a  function  Sx  by  means  of  the  expression 

Sx :  3(X)  x  3{X )  [0,1]:  (x,x')  *-►  min  [l(x  |x'),  I(x'  |x)]  . 

Defined  in  this  way  as  a  “symmetrization”  of  the  preorder  induced  by  the  degree  of  implication 
I(- 1  •),  the  marginal  similarity  Sx  has  the  properties  of  a  similarity  function.  Furthermore,  the 
“projection”  operation  entailed  by  the  use  of  I(x|x'),  based  on  the  projection  of  every  x'- world 
into  the  set  of  x-worlds),  may  be  considered  to  be  the  basic  mechanism  to  transform  the  original 
similarity  function  into  one  that  only  discern  differences  in  the  values  of  the  variable  X. 

It  must  be  noted,  however,  that,  unless  additional  assumptions  are  made  about  the  nature  of  the 
original  similarity  5,  the  function  Sx  fails  to  satisfy  the  intuitive  requirement 

S{w,w')  <  Sx(w,w') , 

whenever  w  h  x  and  w'  h  x'  i.e.,  the  similarity  between  two  objects  from  a  restricted  viewpoint  is 
always  higher  than  their  similarity  from  more  general  regards  that  encompass  additional  criteria  of 
comparison. 

Although  considerable  research  remains  to  identify  alternative  definitions  of  marginal  similarities 
that  are  not  hampered  by  this  problem,  a  basic  result  of  Valverde  [44],  presented  in  Section  6.2  below, 
appears  to  provide  the  essentia]  tool  that  must  be  employed  in  to  produce  the  required  coarser 
measures.  The  role  of  additional  reasonable  assumptions  that  might  be  demanded  from  5  so  as  to 
facilitate  the  construction  of  marginal  similarities  with  desirable  characteristics  is  also  the  object  of 
current  investigations  of  the  author. 


26 


5.2.3  Conditional  Distributions  and  Generalized  Inference 

The  basic  conditional  structures  of  fuzzy  logic  are  usually  defined  as  elastic  constraints  that  restrict 
the  values  of  a  variable  given  those  of  another.  By  simple  extension  of  our  previous  convention  to 
conditional  structures,  we  will  write  Nec(y|z)  and  Poss(y|x) ,  as  shorthand  for 

Nec  ( [Y  =  y]  |  [X  =  *]  )  and  Poss  ( [Y  =  y]  |  [X  =  x]  ) , 

respectively. 

If  a  classical  (i.e.,  Boolean)  inferential  rule  of  the  type 

“If  X  =  x,  then  Y  is  in  R(x)” 


is  thought  of  as  the  definition  of  a  relation  R  defined  over  pairs  (x,y)  in  the  Cartesian  product 
X  x  V,  then  such  a  relation  may  be  used  to  define  a  multivalued  mapping  that  maps  possible  values 
of  X  into  possible  values  of  Y  as  illustrated  in  Figure  5. 


Evidence  In  X 


X 


Figure  5:  Inference  as  a  Compatibility  Relation. 

Such  a  compatibility  relation  perspective  was  an  essential  element  of  the  original  formulations 
of  both  the  Dempster-Shafer  calculus  of  evidence  [8]  where  distributions  in  some  space  (i.e.,  the 
domain  of  some  variable  X)  are  mapped  into  distributions  of  another  variable  (i.e.,  the  domain  of 
another  variable  Y)  by  direct  transfer  of  “mass”  from  individual  values  to  the  union  of  their  mapped 
projections  and  the  compositional  rule  of  inference  [51]. 
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Note  that,  whenever  Poss(yjx)  =  1,  if  the  bound  is  actually  attained,  i.e,  if 

sup  [l(y|w)0l(x  |w)]  =1, 
whft 

then  it  is  possible  for  an  evidential  world  w  in  [X  =  z]  (i.e.,  I(x  |  w)  =  1)  to  be  such  that  why. 
Pairs  (x,y)  such  that  Poss(yjx)  =  1  may  be  considered  to  approximate  the  core10  of  a  generalized 
inferential  relation  that  allows  to  determine  bounds  for  the  similarity  between  evidential  worlds 
and  those  in  the  variable  set  (V  =  y]  on  the  basis  of  knowledge  of  sirrilar  bounds  applicable  to 
the  variable  set  [X  =  z].  This  relation,  which  is  the  fuzzy  extension  of  the  classical  compatibility 
mapping  R  illustrated  in  Figure  5,  may  be  thought  as  a  descriptor  of  the  behavior,  for  x-worlds, 
of  the  values  of  the  variable  Y  “near”  R.  The  compatibility  relation  is  itself  approximated  by  (or 
embedded  in)  the  core  of  the  conditional  possibility  distribution,  i.e.,  worlds  w  such  that  w  h  x  and 
why,  with  Poss(y|z)  =  1. 

Since  the  collection  of  the  sets  [X  =  x]  partitions  the  universe  U  into  disjoint  sets,  then  the 
generalized  modus  ponens  laws  may  be  readily  stated  in  terms  of  variable  values  as 

Nec(y)  =  sup  [Nec(y|x)@Nec(x)j , 

X 

Poss(y)  =  sup  (Poss(y|x)®Po8s(x)]  , 

C 

clearly  showing  the  basic  nature  of  the  inferential  mapping  as  the  composition  of  relational  combi¬ 
nation  (i.e.,  “intersection”)  and  projection  (i.e.,  maximization). 


5.2.4  Fuzzy  Implication  Rules 

In  this  section  we  will  examine  proposed  interpretations  for  conditional  rules,  usually  stated  in  the 
form 

If  X  is  A,  then  Y  is  B , 

within  the  context  of  possibilistic  logic.  While,  in  two-valued  logic,  any  such  rule  simply  states  that 
whenever  a  condition  A  is  true,  another  condition  B  also  holds,  various  interpretations  have  been 
proposed  for  rules  expressing  other  notions  of  conditional  truth. 

In  the  case  of  probabilities,  for  example,  degrees  of  conditionality  have  been  modeled  either  by 
means  of  conditional  probability  values  Prob(A  |  B ),  which  measure  the  likelihood  of  B  given  the 
assumed  truth  of  A,  or  by  the  alternative  interpretation  Prob(-v4  V  B),  used  by  Nilsson  [29]  in  his 
probabilistic  logic,  which  esssentially  quantifies  the  probability  that  a  rule  is  a  valid  component  of  a 
knowledge  base.  Either  one  of  these  interpretations  is  valid  in  particular  contexts  being,  respectively, 
the  probabilistic  extensions  of  the  so  called  “de  re,”  i.e., 

p  —  n?, 


and  “de  dicto”,  i.e., 


n  (p  — « ) , 


interpretations  of  conditionals  in  modal  logic. 

10 The  core  of  a  fuscy  *et  ft:  U  >-►  (0, 1]  is  the  *et  of  all  point*  w  *uch  that  p(u»)  =  1,  i.e.,  the  point*  that  “fully" 
belong  to  p. 
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In  fuzzy  logic,  two  major  interpretations  have  been  advanced  to  translate  conditional  rules,11 
with  A  and  B  corresponding  to  the  fuzzy  sets 

pA  :  X  [0, 1] ,  and  /iB:  Y  [0, 1] . 

The  first  interpretation  was  originally  proposed  by  Zadeh  [52],  as  a  formal  translation  of  the 
statement 


If  nA  is  a  possibility  for  X,  then  /iB  is  a  possibility  distribution  for  V. 

This  conditional  statement,  which  may  be  regarded  as  a  constraint  on  the  values  of  one  variable 
given  those  of  another,  states  the  existence  of  a  conditional  possibility  function  Poss(  |  )  such  that 

/*b(v)  >*up  [Possfol*)®^*)]  >  Poss(y|x)  ®  pA(x) . 

£ 

Recalling  now  the  definition  and  properties  of  the  pseudoinverse,  we  may  restate  this  particular 
interpretation  as 

Poss(y|x)  =  /iB(y)  0  pA(x)  >  I(y  |  w)  0 I(x  |  u>) , 

for  every  world  tv  h  fr. 

In  Zadeh’s  original  formulation,  made  within  the  context  of  a  calculus  based  on  the  minimum 
function  as  the  T-norm,  conditionals  were,  however,  formally  translated  by  means  of  the  pseudoin¬ 
verse  of  the  Lukasiewicz  T-norm.  Certain  formal  problems  associated  with  such  a  combination  were 
pointed  out  by  Trillas  and  Valverde  [42],  who  developed  translations  consistent  with  the  T-norm 
used  as  the  basis  for  the  possibilistic  calculus. 

Using  the  characterization  of  conditionals  introduced  in  Section  4.5,  this  relation  may  also  be 
thought  of  as  a  measure  of  the  degree  by  which  a  possibility  for  Y  exceeds  a  fraction  (measured 
by  the  conditional  possibility  distribution)  of  a  given  possibility  distribution  for  X.  In  particular, 
whenever  Poss(y|x)  =  1,  then  /iB(y)  >  pA{x),  indicating  the  possible  existence  — since  Poss(y|x) 
is  only  an  upper  bound  of  I(y  j  tv)  0 1(x  |  tv)  —  of  an  evidential  world  such  that  tv  h  x  and  why, 
with  x  in  A  and  y  in  B. 

As  illustrated  in  Figure  6,  where  it  has  been  assumed  that  the  underlying  metric  (i.e.,  dissimilar¬ 
ity)  is  proportional  to  the  euclidean  distance  in  the  plane,  the  core  of  the  corresponding  conditional 
possibility  distribution  is  an  (upper)  approximant  of  a  classical  compatibility  relation  (indicated  by 
the  shaded  area  in  the  figure)  that  fans  outward  from  the  Cartesian  product  of  the  cores  of  A  and  B. 
If  this  interpretation  is  taken,  whenever  several  such  rules  are  available,  then  each  one  of  these  rules 
will  lead  to  a  separate  possibility  distribution.  Combination  of  these  upper  bounds  by  minimization 
results  in  a  sharper  possibility  estimate  that  represents  the  ‘‘integrated”  effect  of  the  rule  set. 

The  second  interpretation  of  conditional  relations,  leading  to  a  wide  variety  of  practical  appli¬ 
cations  [41],  was  utilized  by  Mamdani  and  Assilian  to  develop  fuzzy  controllers.  The  basic  idea 
underlying  this  explanation  follows  an  approach  originally  outlined  by  Zadeh  [47,48,51].  In  this  case, 
a  number  of  conditional  statements  of  the  form 

If  A’  is  At,  then  V  is  Bk  ,  i=l,2 . . 

are  given  as  a  combined  “disjunctive”  description  of  the  relation  between  X  and  V,  rather  than 
as  a  set  of  independently  valid  rules.  The  purpose  of  this  rule  set  is  the  approximation  of  the 

11 A  rather  encompassing  account  of  potential  fusty  reasoning  mechanisms  can  be  found  in  a  paper  by  Mizumoto, 
Fukami,  and  Tanaka.  [27] 
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compatibility  relation  by  a  “fuzzy  curve”  generated  by  disjunction  of  all  the  rules  in  the  set,  as 
shown  in  Figure  7. 

Recalling  the  characterization  of  conditioning  as  an  extension  of  a  classical  compatibility  relation, 
we  may  say  that  the  core  of  the  compatibility  relation  is  approximated  by  above  by  the  union 

n 

U  [  core  (fiAk  )  X  core(/iBj] 

k=l 

of  the  Cartesian  products  of  the  cores  of  the  fuzzy  sets  for  Ak  and  fl*.  In  this  case  the  multiple  rules 
are  meant  to  approximate  some  region  of  possible  (X.V)  values,  and  the  result  of  application  of 
individual  component  rules  must  be  combined  using  maximization  to  produce  a  conditional  possibil¬ 
ity  function.  We  may  say,  therefore,  that  under  the  Zadeh-Mamdani-Assilian  (ZMA)  interpretation, 
the  function 

Poss(y|z)  =  sup  |min(/u(*),/i|»(y))j  , 
is  a  conditional  possibility  for  Y  given  X. 

It  is  important  to  note  that  the  two  interpretations  of  fuzzy  rules  that  we  have  just  examined 
are  based  on  different  approaches  to  the  approximation  (by  above)  of  the  value 

sup  |l(y|u»)0l(*|ty)| , 

being,  in  the  the  case  of  the  Zadeh-TVillas-Val verde  (ZTV)  method,  the  result  of  the  conjunction  of 
multiple  fuzzy  relations  such  as  that  illustrated  in  Figure  8,  while,  in  the  case  of  the  ZMA  logic,  the 
construction  requires  disjunction  of  relations  such  as  that  illustrated  in  Figure  9. 

The  difference  between  both  approaches  when  combining  several  rules  is  illustrated  also  in  Fig¬ 
ures  10  and  11,  showing  the  contour  plots  for  the  a-cuts  of  the  fuzzy  relations  that  are  obtained 
in  a  simple  example  involving  four  rules.  In  these  figures,  the  rectangles  with  a  dark  outline  corre¬ 
spond  to  the  Cartesian  products  of  the  cores  of  the  antecedents  Ak  and  Bt .  Darker  shades  of  gray 
correspond  to  higher  degrees  of  membership. 

The  reader  should  be  cautioned,  however,  about  the  potential  for  invalid  comparisons  that  may 
result  from  hasty  examination  of  these  figures.  Each  formalism  should  be  regarded  as  a  procedure  for 
the  approximation  of  a  compatibility  relation  that  is  based  on  a  different  approach  for  the  description 
of  relationships  between  variables.  In  the  case  of  the  ZMA  interpretation,  the  intent  is  to  generalize 
the  interpolation  procedures  that  are  normally  employed  in  functional  approximation.  As  such,  this 
approach  may  be  said  to  be  inspired  by  the  methodology  of  classical  system  analysis.  The  ZTV 
approach,  by  contrast,  is  a  generalization  of  classical  logical  formulations  and  may  be  regarded, 
from  a  relational  viewpoint,  as  a  procedure  to  describe  a  function  as  the  locus  of  points  that  satisfies 
a  set  of  constraints  rather  than  as  a  subset  of  “fuzzy  points”  of  a  Cartesian  product. 

Figures  10  and  11,  while  showing  that  the  same  rule  sets  would  lead  to  radically  different  results, 
should  not  be  considered,  therefore,  to  discredit  interpolative  approaches  as  such  techniques,  pro¬ 
ceeding  from  a  different  perspective,  should  normally  be  based  on  rule  sets  that  are  different  from 
those  utilized  when  rules  are  thought  of  as  independent  constraints. 
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Figure  8:  A  Possibilistic  Conditional  Rule  (ZTV) 


Figure  9:  A  Component  of  a  Disjunctive  Rule  Set  (ZMA) 
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Figure  10:  Contour  Plots  for  a  Rule  Set  (ZTV) 


Figure  11:  Contour  Plots  for  a  Rule  Set  (ZMA) 
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6  THE  NATURE  OF  SIMILARITY  RELATIONS 


In  this  closing  section,  we  will  examine  issues  that  arise  naturally  from  our  previous  examination  of 
the  role  of  similarities  as  the  semantic  bases  for  possibility  theory. 

Our  discussion  focuses  on  two  topics.  We  look  first  at  the  requirements  that  our  theory  imposes 
upon  the  nature  of  the  scales  used  to  measure  proximity  or  resemblance  between  possible  worlds. 
Finally,  our  examination  of  the  interplay  between  similarities  and  possibilities  turns  to  issues  related 
to  the  generation  of  similarity  relations  from  such  sources  as  domain  knowledge  that  describes 
significant  relations  between  system  variables. 

6.1  On  Similarity  Scales 

Our  previous  interpretation  of  possibilistic  concepts  and  structures  has  been  based  on  the  use  of 
measures  of  proximity  that  quantify  interobject  resemblance  using  real  numbers  between  0  and  1. 
Our  assumptions  about  the  use  of  the  [0, 1]  interval  as  a  similarity  scale  have  been  made  primarily, 
however,  as  a  matter  of  convenience  so  as  to  simplify  the  description  of  our  model  while  being 
consistent  with  the  customary  definitions  of  possibility  and  necessity  distributions  as  functions  taking 
values  in  that  interval. 

Close  examination  of  the  actual  requirements  imposed  upon  our  similarity  scales  reveals,  however, 
that  our  measurement  domain  may  be  quite  general  so  as  to  include  symbolic  structures  such  as 

{  identical ,  very  similar completely  dissimilar}  . 

Our  model  is  based  on  the  use  of  a  partially  ordered  set  having  a  maximal  and  a  minimal  element 
that  measure  identity  and  complete  dissimilarity,  respectively.  Furthermore,  we  have  assumed  the 
existence  of  a  binary  operation  (the  triangular  norm  ®)  mapping  pairs  of  possible  worlds  into  real 
numbers,  with  certain  desirable  order-preserving  and  transitive  properties.  The  concept  of  triangular 
norm,  however,  does  not  rely  substantially  on  the  use  of  real  numbers  as  its  range  and  may  be  readily 
extended  to  more  general  partially  ordered  sets  with  maximal  and  minimal  elements. 

We  have  also  assumed  a  continuity  property  for  the  triangular  norm  operation.  This  property, 
however,  simply  requires  that  a  notion  of  proximity  also  exist  among  similarity  values  so  as  to 
provide  a  form  of  (order-consistent)  topology  in  that  space.  While,  in  general,  more  precise  scales 
will  result  in  more  detailed  representations  of  interworld  similarity,  it  is  important  to  stress  that  the 
similarity-based  model  presented  here  does  not  rely  in  “denseness”  assumptions  such  as  the  existence 
an  intermediate  value  c  between  any  different  values  a  and  b  in  the  similarity-measurement  scale. 

From  a  practical  viewpoint,  the  major  requirement  is  to  quantify  proximity  in  such  a  way  as  to 
be  able  to  determine  that  two  quantities  are  similar  to  some  degree  (i.e.,  approximate  matching). 
The  degree  of  precision  that  such  a  matching  entails  is  problem-dependent  and  will  be  typically  the 
result  of  conflicting  impositions  between  the  desire,  on  one  hand,  to  keep  granularity  relatively  high 
to  reduce  complexity,  and  the  need,  on  the  other,  to  describe  system  behavior  at  an  acceptable  level 
of  accuracy.  The  work  of  Boniasone  and  Decker  [4]  is  a  significant  example  of  the  type  of  systematic 
study  that  must  be  carried  out  to  define  similarity  scales  that  are  both  useful  and  tractable. 
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6.2  The  Origin  of  Similarity  Functions 

The  model  of  fuzzy  logic  presented  in  this  note  is  centered  on  the  metric  notion  of  similarity  as  a 
primitive  concept  that  is  useful  to  explain  the  nature  of  poesibilistic  constructs  and  the  meaning 
of  poesibilistic  reasoning.  In  this  formulation,  similarities  are  defined  as  real  functions  defined  over 
pairs  of  possible  worlds. 

From  this  perspective,  similarities  describe  relations  of  resemblance  between  objects  of  high  com¬ 
plexity,  which,  typically,  result  from  consideration  of  a  large  number  of  system  variables.  Reliance 
on  such  complex  structures  has  been  the  direct  consequence  of  a  research  program  that  stressed 
conceptual  clarification  as  its  primary  objective.  In  practice,  however,  it  will  be  generally  difficult 
to  define  complex  measures  that  quantify  similarity  between  complex  objects  on  the  basis  of  a  large 
number  of  criteria. 

Similarities  provide  the  framework  that  is  required  to  understand  approximate  relations  of  corele¬ 
vance,  usually  stated  as  generalized  conditional  rules.  The  practical  generation  of  similarity  functions 
typically  proceeds,  however,  in  the  opposite  direction,  from  separate  statements  about  limited  as¬ 
pects  of  system  behavior  to  general  metric  structures.  Once  such  resemblance  measures  are  defined, 
they  may  be  used  to  express  and  acquire  new  laws  of  system  behavior  determined,  for  example,  from 
historical  experience  with  similar  systems.  Furthermore,  such  similarity  notions  may  be  used  as  the 
basis  for  analogical  reasoning  systems  that  try  to  determine  system  state  on  the  basis  of  similarity 
to  known  cases  [23]. 

Perhaps  the  simplest  mechanism  that  may  be  devised  to  generate  complex  metrics  from  sim¬ 
pler  ones  is  that  which  starts  with  measures  of  resemblance  that  quantify  proximity  from  a  limited 
viewpoint.  These  metrics  are  usually  derived,  using  a  variety  techniques,  in  unsupervised  pattern 
classification  (or  clustering)  problems  [20].  In  many  important  applications,  hierarchical  taxonomies 
— a  feature  of  many  representation  approaches  in  artificial  intelligence — may  be  used,  often  in  con¬ 
nection  with  a  variety  of  weighing  schemes — quantifying  branching  importance — to  generate  metrics 
that  often  satisfy  the  more  stringent  requirements  of  an  ultrametric  [22]. 

Classification  hierarchies  such  as  those  may  be  thought  of  as  sets  of  general  rules,  having  a  par¬ 
ticularly  useful  structure,  that  specify  interset  proximity  from  relevant,  but  restricted  viewpoints, 
eventually  providing  measures  of  similarity  between  variable  values  (i.e.,  the  “leaves”  of  the  taxo- 
nomical  tree).  More  generally,  however,  we  may  expect  that  sets  of  poesibilistic  rules  (i.e.,  a  general 
knowledge  base)  defining  a  general  semantic  network  of  corelevance  relations  may  be  available  as 
the  source  for  the  determination  of  interobject  proximity.  These  poesibilistic  semantic  networks 
resemble  conventional  semantic  networks  in  most  regards,  being  more  general  in  that,  in  addition 
to  specifying  knowledge  about  system  behavior  in  some  subsets  of  state-space,13  they  also  specify 
characteristics  of  behavior  in  neighborhoods  of  those  subsets. 

We  may  think,  therefore,  that  the  antecedents  of  implicational  rules  define  general  regions  in  state 
space  where  existence  of  relevant  knowledge  may  increase  insight  through  application  of  inferential 
rules.  Using  Zadeh’s  terminology,  these  antecedents  define  “granules”  that  identify  important  regions 
of  state-space  and  indicate  the  level  of  accuracy  that  is  required  (or  granularity)  to  perform  effective 
system  analysis.  In  this  case,  the  poesibilistic  granules  correspond  to  fuzzy  sets  that  are  used  to 
specify  both  what  is  true  in  the  core  of  the  granule  and,  with  decreasing  specificity,  wbat  is  true 
in  a  nested  set  (i.e.,  the  a-cuts)  of  its  neighborhoods.  The  ability  to  specify  behavior  using  such 
a  topological  structure  results  in  inferential  gains  that  are  the  direct  consequence  of  our  ability 

13Tbe  expression  “state  space"  is  loosely  used  here  to  indicate  the  space  defined  by  all  system  variables. 
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to  reason  by  similarity;  an  ability  that  is  made  possible  by  the  approximate  matching  property 
of  the  generalized  modus  ponens.  From  another  perspective  yet,  the  fuzzy  granules  identified  by 
possibilistic  rules  may  also  be  thought  of  as  generalizations  of  the  arbitrary  variable  sets  used  in 
a  variety  of  artificial  intelligence  efforts  aimed  at  understanding  system  behavior  using  qualitative 
descriptions  of  reality  [16]. 

A  number  erf’ heuristics  may  be  easily  formulated  to  integrate  “marginal”  measures  of  resemblance 
into  joint  similarity  relations.  More  generally,  however,  we  may  state  the  problem  of  similarity 
construction  as  that  of  defining  metric  structures  on  the  basis  of  knowledge  of  the  aspects  of  system 
behavior  that  are  important  to  its  understanding — i.e.,  the  previously  mentioned  granules,  which 
define  what  must  be  distinguished.  Since  generally  those  granules  are  fuzzy  sets,  the  relevance  to 
similarity  construction  of  the  following  representation  theorem,  due  to  Valverde,  may  be  immediately 
seen: 


Theorem  [Valverde]:  A  binary  function  5  mapping  pairs  of  objects  of  a  universe  of  discourse  U 
into  [0, 1]  is  a  similarity  relation,  if  and  only  if  there  exists  a  family  X  of  fuzzy  subsets  of  U  such 


that 


S(w,  w')  =  inf  |  min  ^  h(w)  0  h( w'),  h(w')  0  h(w)  ^  j  , 


for  all  w  and  u/  mil,  where  the  infimum  is  taken  over  all  fuzzy  subsets  h  in  the  family  X. 


Besides  its  obvious  relevance  to  the  generation  of  similarity  relations  from  knowledge  of  important 
sets  in  the  domain  of  discourse,  Valverde’s  theorem — resulting  originally  from  studies  in  pattern 
recognition — is  also  of  potential  significance  to  the  solution  of  knowledge  acquisition  problems  be¬ 
cause  of  the  important  relations  that  exist  between  learning  procedures  and  structure-discovery 
techniques  such  as  cluster  analysis. 
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7  CONCLUSION 


This  note  has  presented  a  similarity-based  model  that  provides  a  clear  interpretation  of  the  major 
structures  and  methods  of  possibilistic  logic  using  metric  concepts  that  are  formally  different  from 
the  set-measure  constructs  of  probability  theory.  Regardless  of  the  potential  existence,  so  far  un¬ 
established,  of  probability-based  interpretations  for  possibilistic  structures,  this  metric  model  makes 
clear  that  there  are  no  compelling  reasons  to  confuse  two  rather  different  aspects  of  uncertainty  into 
a  single  notion  simply  because  one’s  favorite  theoretical  framework,  in  spite  of  its  otherwise  many 
remarkable  virtues,  fails  to  fully  capture  reality. 

Succintly  stated,  being  in  a  situation  that  resembles  a  state  of  affairs  5  does  not  make  5  likely  or 
viceversa.  Furthermore,  our  reference  state  may  not  even  be  possible  in  the  current  circumstances 
— making  it  completely  unlikely — but  we  may  still  find  it  useful  as  a  comparison  landmark.This 
use  of  “impossible’’  examples  as  a  way  to  illustrate  system  behavior  is  very  prevalent  in  human 
culture,  being  exemplified  by  such  utterances  as  “he  had  the  strength  of  a  horse  and  the  swiftness 
of  a  swallow,”  even  if  it  is  obvious  to  all  that  no  such  beasts  exist  other  than  for  such  metaphorical 
purposes. 

The  insight  provided  by  this  model  makes  it  rather  obvious  that  very  little  can  be  gained  by 
continuing  to  assert  a  potential — although  never  revealed — encompassing  probabilistic  interpretation 
for  possibilistic  structures  that,  presumably,  would  render  them  unnecessary  as  serious  objects  of 
scientific  discourse.  In  addition,  and  quite  beyond  whatever  understanding  theory  may  provide,  the 
current  success  of  possibilistic  logic  as  the  basis  for  major  systems  of  important  human  value  [41] 
— often  unmatched  by  other  approaches — should  be  enough  to  convince  those  having  more  pragmatic 
perspectives  as  to  its  utility. 

The  task  for  approximate  reasoning  researchers  is  to  proceed  now  beyond  unnecessary  controversy 
into  the  study  of  the  issues  that  arise  from  models  such  as  the  one  presented  in  this  note.  Among 
such  questions,  further  studies  of  the  relations  between  the  notions  of  possibility,  similarity,  and 
negation  and  of  those  between  probability  and  possibility  are  of  major  importance. 
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